Gupta et al.: Boosting for regression transfer via importance sampling

Gupta S, Bi J, Liu Y, Wildani A. 2023. Boosting for regression transfer via importance sampling. Int J Data Sci Anal.

Instance transfer learning methodologies are extremely efficient for continuous-valued, regression datasets. However, these methodologies can suffer negative transfer due to distribution shifts between the training and test data as well as skewness in training caused by the large-sampled source dataset. To mitigate this, we introduce S-TradaBoost.R2, a boosting-based instance transfer learning methodology that utilizes importance sampling to reduce the skewness in training and a balanced weighing approach for the distribution shift. We tested the performance of our approach on 8 standard regression datasets with varying complexities and found that S-TrAdaBoost.R2 performs better than the competitive transfer learning methodologies 63% of the time. Moreover, It also displayed a consistent performance as opposed to the sporadic results observed for other transfer learning methodologies.

Article link

Pruthi, D., Liu, Y.: Low-cost nature-inspired deep learning system for PM2.5 forecast over Delhi

Pruthi, D., Liu, Y. (2022). Low-cost nature-inspired deep learning system for PM2.5 forecast over Delhi, India. Environment International,166, 107373.

Science Direct Link

Air quality models are crucial tools for surveying and projecting air pollution episodes, which can be used to issue health advisories to act ahead of time. Short-term increases in air pollution trigger many adverse health events; a fast, efficient, cost-effective, and reliable air quality prediction model would aid in minimizing the effect on health and prosperity. Despite advances in certain nations in lowering air pollution exposure, the global health impact of ambient fine particulate matter (PM2.5) is growing every year. Short-term spikes in pollution cause several negative health outcomes, notably cardiovascular and respiratory-related fatalities, hospitalizations, and emergency room visits. In this study, we developed a low-cost hybrid model that combines wavelet, adaptive network fuzzy inference system (ANFIS), and particle swarm optimization (PSO) for the short-term PM2.5 forecast.

Air pollution particularly a high level of PM2.5 is a significant problem in Delhi, attracting significant attention from the Government of India (GOI). GOI has initiated various programs to reduce surface PM2.5 concentrations and their impact in Delhi. One such program is the Graded Response Action Plan (GRAP), which works to take a set of measures depending on the current level of pollution. Central Pollution Control Board, Delhi Pollution Control Board, and India Meteorological Department are the major organizations monitoring air pollution in Delhi. Using this dense network of monitoring stations, our model is based on historical concentration to forecast PM2.5 concentration.

In this study, we have included the aspects/attributes desired in the air quality models i.e., less computational time (7 min approximately using I5-1035G1, 1.19 GHz processor), less resource-intensive (dependent only on the pollutant lagged values), and high spatial resolution (1 km) for predicting the air quality three days in advance. The model predictions show a significant correlation coefficient lying in [0.96,0.98], [0.86,0.93], and [0.82,0.91] with Central Pollution Control Board (CPCB) monitored data at various sites in Delhi for one, two, and three days of forecast respectively. We interpolate PM2.5 over Delhi using the Inverse distance weighting (IDW) method to get predictions available for the regions not having any monitoring site. The model performed well in capturing the precise spatial distribution of PM2.5 three days forecast over Delhi, which is accomplished and confirmed using PM2.5 obtained from MODIS AOD at 1km.The precision of PM2.5 forecasting will lead to accurate AQI prediction, which will help both the local forecasters and model developers.  

CPCB monitoring stations in Delhi considered for the study
Predicted vs. Observed Data (a) Trained PM2.5 data (b) Day 1 Prediction (c) Day 2 Prediction (d) Day 3 Prediction for CPCB monitoring stations