Zhang et al.: Wildland Fires Worsened Population Exposure to PM2.5 Pollution in the Contiguous United States

Danlu Zhang, Wenhao Wang, Yuzhi Xi, Jianzhao Bi, Yun Hang, Qingyang Zhu, Qiang Pu, Howard Chang, and Yang Liu (2023). Wildland Fires Worsened Population Exposure to PM2.5 Pollution in the Contiguous United States. Environmental Science & Technology Article ASAP

Science Direct: link

As wildland fires become more frequent and intense, fire smoke has significantly worsened the ambient air quality, posing greater health risks. To better understand the impact of wildfire smoke on air quality, we developed a modeling system to estimate daily PM2.5 concentrations attributed to both fire smoke and non-smoke sources across the contiguous U.S. We found that wildfire smoke has the most significant impact on air quality in the West Coast, followed by the Southeastern U.S. Between 2007 and 2018, fire smoke contributed over 25% of daily PM2.5 concentrations at ∼40% of all regulatory air monitors in the EPA’s air quality system (AQS) for more than one month per year. People residing outside the vicinity of an EPA AQS monitor (defined by a 5 km radius) were subject to 36% more smoke impact days compared with those residing nearby. Lowering the national ambient air quality standard (NAAQS) for annual mean PM2.5 concentrations to between 9 and 10 μg/m3 would result in approximately 35–49% of the AQS monitors falling in nonattainment areas, taking into account the impact of fire smoke. If fire smoke contribution is excluded, this percentage would be reduced by 6 and 9%, demonstrating the significant negative impact of wildland fires on air quality.

Gupta et al.: Boosting for regression transfer via importance sampling

Gupta S, Bi J, Liu Y, Wildani A. 2023. Boosting for regression transfer via importance sampling. Int J Data Sci Anal. https://doi.org/10.1007/s41060-023-00414-8.

Instance transfer learning methodologies are extremely efficient for continuous-valued, regression datasets. However, these methodologies can suffer negative transfer due to distribution shifts between the training and test data as well as skewness in training caused by the large-sampled source dataset. To mitigate this, we introduce S-TradaBoost.R2, a boosting-based instance transfer learning methodology that utilizes importance sampling to reduce the skewness in training and a balanced weighing approach for the distribution shift. We tested the performance of our approach on 8 standard regression datasets with varying complexities and found that S-TrAdaBoost.R2 performs better than the competitive transfer learning methodologies 63% of the time. Moreover, It also displayed a consistent performance as opposed to the sporadic results observed for other transfer learning methodologies.

Article link

Huang et al.: Satellite-Based Long-Term Spatiotemporal Trends in Ambient NO2 Concentrations and Attributable Health Burdens in China From 2005 to 2020.

Keyong Huang, Qingyang Zhu, Xiangfeng Lu, Dongfeng Gu, Yang Liu. (2023). Satellite-Based Long-Term Spatiotemporal Trends in Ambient NO2 Concentrations and Attributable Health Burdens in China From 2005 to 2020. GeoHealth, 2023, 7(5): e2023GH000798. doi: 10.1029/2023GH000798.


Read online: https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2023GH000798


Previous studies about air pollution in China focused on fine particulate matter and ozone. As one of the major NOx emission countries worldwide, China has experienced serious air pollution in last decades. However, the long-term spatiotemporal distribution of NO2 levels, especially before 2013, was still unknown. In the current study, we firstly gap filled the missing OMI satellite NO2 values, then developed an ensemble machine learning model to estimate the spatiotemporal pattern of monthly mean NO2 concentrations at 0.05° spatial resolution from 2005 to 2020 in China and further estimated its attributable disease burden. The ensemble model predictions had good agreement with observations, and the sample-based, temporal and spatial cross-validation (CV) R2 were 0.88, 0.82 and 0.73, respectively. In addition, our model can provide accurate historical NO2 concentrations, with both by-year CV R2 and external separate year validation R2 achieving 0.80. The estimated national NO2 levels showed increasing trend during 2005-2011, then decreased gradually until 2020, especially in 2012-2015. The annual mortality burden attributable to NO2 exposure ranged from 305 thousand to 416 thousand, and varied considerably across provinces in China. Our model could provide reliable long-term NO2 predictions at a high spatial resolution for epidemiological studies in China. Our results also highlighted the heavy disease burden by NO2 and call for policies to reduce the emission of NOx in China.


By-year cross validation and external validation using data from year 2020.


Annual mean mortality burden attributable to long-term NO2 exposure in China. (a), annual NO2 related mortality burden per 100,000 persons at provincial level in 2005, 2010, 2015, and 2020. (b), national mean NO2 related mortality burden from 2005 to 2020.

Meng and Hang et al.: A satellite-driven model to estimate long-term particulate sulfate levels and attributable mortality burden in China

Meng, X., Hang, Y., Lin, X., Li, T., Wang, T., Cao, J., Fu, Q., Dey, S., Huang, K., Liang, F. and Kan, H., 2023. A satellite-driven model to estimate long-term particulate sulfate levels and attributable mortality burden in China. Environment International, p.107740.

Science Direct: Link

Ambient fine particulate matter (PM2.5) pollution is a major environmental and public health challenge in China. In the recent decade, the PM2.5 level has dramatically decreased mainly contributed by reductions in particulate sulfate. This is due to the fact that desulfurization, a process to remove sulfur (precursor of sulfate) from coal-fired power plants and industrial facilities, is one of the major air quality policies in China. Therefore, it is necessary to characterize the variation of sulfate as it is a strong indicator of assessing existing efforts toward improving air quality. However, estimating the long-term spatiotemporal trend of sulfate is challenging because a ground monitoring network of PM2.5 constituents has not been established. Fortunately, spaceborne sensors such as the Multi-angle Imaging SpectroRadiometer (MISR) instrument can provide complementary information on aerosol types. With the help of state-of-the-art machine learning techniques, we developed a sulfate prediction model under support from available ground measurements, MISR’s aerosol optical depth data, and other atmospheric reanalysis at a spatial resolution of 10 km. Compared with classical chemical transport models, our sulfate model performs better with a random cross-validation R2 of 0.9 at the monthly level. We found that the national mean population-weighted sulfate concentration was relatively stable before the Air Pollution Prevention and Control Action Plan was born in 2013, but dramatically decreased by 28.7% from 2013-2018. Correspondingly, the total non-accidental and cardiopulmonary deaths attributed to sulfate decreased by 40.7% and 42.3%, respectively. Our study’s reliable sulfate estimates will advance future studies on evaluating air quality policies and understanding the adverse health effect of particulate sulfate.

Pruthi, D., Liu, Y.: Low-cost nature-inspired deep learning system for PM2.5 forecast over Delhi

Pruthi, D., Liu, Y. (2022). Low-cost nature-inspired deep learning system for PM2.5 forecast over Delhi, India. Environment International,166, 107373.

Science Direct Link

Air quality models are crucial tools for surveying and projecting air pollution episodes, which can be used to issue health advisories to act ahead of time. Short-term increases in air pollution trigger many adverse health events; a fast, efficient, cost-effective, and reliable air quality prediction model would aid in minimizing the effect on health and prosperity. Despite advances in certain nations in lowering air pollution exposure, the global health impact of ambient fine particulate matter (PM2.5) is growing every year. Short-term spikes in pollution cause several negative health outcomes, notably cardiovascular and respiratory-related fatalities, hospitalizations, and emergency room visits. In this study, we developed a low-cost hybrid model that combines wavelet, adaptive network fuzzy inference system (ANFIS), and particle swarm optimization (PSO) for the short-term PM2.5 forecast.

Air pollution particularly a high level of PM2.5 is a significant problem in Delhi, attracting significant attention from the Government of India (GOI). GOI has initiated various programs to reduce surface PM2.5 concentrations and their impact in Delhi. One such program is the Graded Response Action Plan (GRAP), which works to take a set of measures depending on the current level of pollution. Central Pollution Control Board, Delhi Pollution Control Board, and India Meteorological Department are the major organizations monitoring air pollution in Delhi. Using this dense network of monitoring stations, our model is based on historical concentration to forecast PM2.5 concentration.

In this study, we have included the aspects/attributes desired in the air quality models i.e., less computational time (7 min approximately using I5-1035G1, 1.19 GHz processor), less resource-intensive (dependent only on the pollutant lagged values), and high spatial resolution (1 km) for predicting the air quality three days in advance. The model predictions show a significant correlation coefficient lying in [0.96,0.98], [0.86,0.93], and [0.82,0.91] with Central Pollution Control Board (CPCB) monitored data at various sites in Delhi for one, two, and three days of forecast respectively. We interpolate PM2.5 over Delhi using the Inverse distance weighting (IDW) method to get predictions available for the regions not having any monitoring site. The model performed well in capturing the precise spatial distribution of PM2.5 three days forecast over Delhi, which is accomplished and confirmed using PM2.5 obtained from MODIS AOD at 1km.The precision of PM2.5 forecasting will lead to accurate AQI prediction, which will help both the local forecasters and model developers.  

CPCB monitoring stations in Delhi considered for the study
Predicted vs. Observed Data (a) Trained PM2.5 data (b) Day 1 Prediction (c) Day 2 Prediction (d) Day 3 Prediction for CPCB monitoring stations



Vu et al.: Application of geostationary satellite and high-resolution meteorology data in estimating hourly PM2.5 levels during the Camp Fire episode in California

Vu, B.N., Bi, J., Wang, W., Huff, A., Kondragunta, S., Liu, Y. (2022). Application of geostationary satellite and high-resolution meteorology data in estimating hourly PM2.5 levels during the Camp Fire episode in California. Remote Sensing of Environment, 271, 112890.

Science Direct: Link

Particulate matter from wildland fire smoke can traverse hundreds of kilometers from where they originated and result in excess morbidity and mortality. However, estimating PM2.5 levels in and around wildland fires poses many challenges including lack of monitors at or near smoke plumes and temporally coverage since monitors generally obtain daily measurements yet smoke plumes continually change throughout the day due to wind. In this paper, we estimated total PM2.5 concentration during the Camp Fire episode, one of the deadliest wildland fires in California history. We used a random forest (RF) model to calibrate hourly regulatory monitor measurements (AQS maintained by the EPA) and low-cost sensors (PurpleAir) to Geostationary Operational Satellite-16 (GOES-16)’s AOD and fire spot detection variables, High-Resolution Rapid Refresh (HRRR) meteorology, and ancillary variables including land-use, distance to road, and a convolutional layer calculated for each hour. We tried three separate approaches: 1) an AQS-only model; 2) AQS + weighted PurpleAir model; and 3) AQS + weighted PurpleAir + SMOTE model. Synthetic Minority-Oversampling Technique (SMOTE) was applied to enhance and bolster the number of extremely high observations and improve model performance. The AQS-only model achieved and out of bag (OOB) R2 (RMSE) of 0.84 (12.00 μg/m3) and a spatial and temporal cross-validation (CV) R2 (RMSE) of 0.74 (16.28 μg/m3) and 0.73 (16.58 μg/m3), respectively. The AQS + weighted PurpleAir model’s OOB and spatial and temporal CV R2 (RMSE) was 0.86 (9.52 μg/m3), 0.75 (14.93 μg/m3), and 0.79 (11.89 μg/m3), respectively. The AQS + weighted PurpleAir + SMOTE model’s OOB and spatial and temporal CV R2 (RMSE) was 0.92 (10.44 μg/m3), 0.84 (12.36 μg/m3), and 0.85 (14.88 μg/m3), respectively. Results from this study indicate that increased measurements as well as the application of SMOTE not only improves model performance but also residual errors, and may aid in epidemiological studies investigating intense and acute exposure to wildland fire smoke.     


Density Scatter Plot
Density Scatter Plots of spatial and temporal CV from left to right: Top (spatial CVs for AQS-only, AQS + weighted PurpleAir, AQS + weighted PurpleAir + SMOTE); Bottom (temporal CVs for AQS-only, AQS + weighted PurpleAir, AQS + weighted PurpleAir + SMOTE)


Blowout Maps
Prediction Maps at noon on November 16th, 2018. Blowouts from left to right: AQS-only model, AQS + weighted PurpleAir, AQS + weight PurpleAir + SMOTE.


Comparison to true-color composite
Comparison of prediction maps to true-color composites from MODIS instruments.

Bi et al.: Combining Machine Learning and Numerical Simulation for High-Resolution PM2.5 Concentration Forecast

Forecasting ambient PM2.5 concentrations with spatiotemporal coverage is key to alerting decision makers of pollution episodes and preventing detrimental public exposure. In this study, we developed a PM2.5 forecast framework by combining the robust Random Forest algorithm with a publicly accessible global CTM forecast product, NASA’s Goddard Earth Observing System “Composition Forecasting” (GEOS-CF), providing spatiotemporally continuous PM2.5 concentration forecasts for the next 5 days at a 1 km spatial resolution. Our forecast experiment was conducted for a region in Central China including the populous and polluted Fenwei Plain. Our model showed satisfactory forecast performance and substantially reduced the large biases in GEOS-CF. Our proposed framework requires minimal computational resources compared to running CTMs at urban scales, enabling near-real-time PM2.5 forecast in resource-restricted environments. 

Read online: https://doi.org/10.1021/acs.est.1c05578

Stowell et al.: Asthma exacerbation due to climate change-induced wildfire smoke in the Western US

Climate change and human activities have drastically altered the natural wildfire balance in the Western US and increased population health risks due to exposure to pollutants from fire smoke. Using dynamically downscaled climate model projections, we estimated additional asthma emergency room visits and hospitalizations due to exposure to smoke fine particulate matter (PM2.5) in the Western US in the 2050s. Under the ICLUS A2 population scenario, we estimated the smoke-related asthma events could increase at a rate of 15.1 visits per 10 000 persons in the Western US, with the highest rates of increased asthma (25.7–41.9 per 10 000) in Idaho, Montana, Oregon, and Washington. Estimated healthcare costs of smoke-induced asthma exacerbation to be over $1.5 billion during a single future fire season. Open access article link here.

Wang et al.: A machine learning model to estimate ground-level ozone concentrations in California using TROPOMI data and high-resolution meteorology

Wenhao Wang, Xiong Liu, Jianzhao Bi, Yang Liu, A machine learning model to estimate ground-level ozone concentrations in California using TROPOMI data and high-resolution meteorology, Environment International, Volume 158, 2022, 106917

Abstract: Estimating ground-level ozone concentrations is crucial to study the adverse health effects of ozone exposure and better understand the impacts of ground-level ozone on biodiversity and vegetation. However, few studies have attempted to use satellite retrieved ozone as an indicator given their low sensitivity in the boundary layer. Using the Troposphere Monitoring Instrument (TROPOMI)’s total ozone column together with the ozone profile information retrieved by the Ozone Monitoring Instrument (OMI), as TROPOMI ozone profile product has not been released, we developed a machine learning model to estimate daily maximum 8-hour average ground-level ozone concentration at 10 km spatial resolution in California. In addition to satellite parameters, we included meteorological fields from the High-Resolution Rapid Refresh (HRRR) system at 3 km resolution and land-use information as predictors. Our model achieved an overall 10-fold cross-validation (CV) R2 of 0.84 with root mean square error (RMSE) of 0.0059 ppm, indicating a good agreement between model predictions and observations. Model predictions showed that the suburb of Los Angeles Metropolitan area had the highest ozone levels, while the Bay Area and the Pacific coast had the lowest. High ozone levels are also seen in Southern California and along the east side of the Central Valley. TROPOMI data improved the estimate of extreme values when compared to a similar model without it. Our study demonstrates the feasibility and value of using TROPOMI data in the spatiotemporal characterization of ground-level ozone concentration.

Keywords: Surface ozone; TROPOMI; OMI; Ozone profile; HRRR; Random forest; Spatiotemporal distribution

Zhang et al.: A machine learning model to estimate ambient PM2.5 concentrations in industrialized highveld region of South Africa

Danlu Zhang, Linlin Du, Wenhao Wang, Qingyang Zhu, Jianzhao Bi, Noah Scovronick, Mogesh Naidoo, Rebecca M. Garland, Yang Liu. (2021). A machine learning model to estimate ambient PM2.5 concentrations in industrialized highveld region of South Africa. Remote Sensing of Environment, 266, 112713.

Elsevier: Link

Exposure to fine particulate matter (PM2.5) has been linked to a substantial disease burden globally, yet little has been done to estimate the population health risks of PM2.5 in South Africa due to the lack of high-resolution PM2.5 exposure estimates. We developed a random forest model to estimate daily PM2.5 concentrations at 1 km2 resolution in and around industrialized Gauteng Province, South Africa, by combining satellite aerosol optical depth (AOD), meteorology, land use, and socioeconomic data. We then compared PM2.5 concentrations in the study domain before and after the implementation of the new national air quality standards. Model-estimated PM2.5 levels successfully captured the temporal pattern recorded by ground observations. Spatially, the highest annual PM2.5 concentration appeared in central and northern Gauteng, including northern Johannesburg and the city of Tshwane. Since the 2016 changes in national PM2.5 standards, PM2.5 concentrations have decreased in most of our study region, although levels in Johannesburg and its surrounding areas have remained relatively constant.