Huang et al.: Satellite-Based Long-Term Spatiotemporal Trends in Ambient NO2 Concentrations and Attributable Health Burdens in China From 2005 to 2020.

Keyong Huang, Qingyang Zhu, Xiangfeng Lu, Dongfeng Gu, Yang Liu. (2023). Satellite-Based Long-Term Spatiotemporal Trends in Ambient NO2 Concentrations and Attributable Health Burdens in China From 2005 to 2020. GeoHealth, 2023, 7(5): e2023GH000798. doi: 10.1029/2023GH000798.


Read online:


Previous studies about air pollution in China focused on fine particulate matter and ozone. As one of the major NOx emission countries worldwide, China has experienced serious air pollution in last decades. However, the long-term spatiotemporal distribution of NO2 levels, especially before 2013, was still unknown. In the current study, we firstly gap filled the missing OMI satellite NO2 values, then developed an ensemble machine learning model to estimate the spatiotemporal pattern of monthly mean NO2 concentrations at 0.05° spatial resolution from 2005 to 2020 in China and further estimated its attributable disease burden. The ensemble model predictions had good agreement with observations, and the sample-based, temporal and spatial cross-validation (CV) R2 were 0.88, 0.82 and 0.73, respectively. In addition, our model can provide accurate historical NO2 concentrations, with both by-year CV R2 and external separate year validation R2 achieving 0.80. The estimated national NO2 levels showed increasing trend during 2005-2011, then decreased gradually until 2020, especially in 2012-2015. The annual mortality burden attributable to NO2 exposure ranged from 305 thousand to 416 thousand, and varied considerably across provinces in China. Our model could provide reliable long-term NO2 predictions at a high spatial resolution for epidemiological studies in China. Our results also highlighted the heavy disease burden by NO2 and call for policies to reduce the emission of NOx in China.


By-year cross validation and external validation using data from year 2020.


Annual mean mortality burden attributable to long-term NO2 exposure in China. (a), annual NO2 related mortality burden per 100,000 persons at provincial level in 2005, 2010, 2015, and 2020. (b), national mean NO2 related mortality burden from 2005 to 2020.

Meng and Hang et al.: A satellite-driven model to estimate long-term particulate sulfate levels and attributable mortality burden in China

Meng, X., Hang, Y., Lin, X., Li, T., Wang, T., Cao, J., Fu, Q., Dey, S., Huang, K., Liang, F. and Kan, H., 2023. A satellite-driven model to estimate long-term particulate sulfate levels and attributable mortality burden in China. Environment International, p.107740.

Science Direct: Link

Ambient fine particulate matter (PM2.5) pollution is a major environmental and public health challenge in China. In the recent decade, the PM2.5 level has dramatically decreased mainly contributed by reductions in particulate sulfate. This is due to the fact that desulfurization, a process to remove sulfur (precursor of sulfate) from coal-fired power plants and industrial facilities, is one of the major air quality policies in China. Therefore, it is necessary to characterize the variation of sulfate as it is a strong indicator of assessing existing efforts toward improving air quality. However, estimating the long-term spatiotemporal trend of sulfate is challenging because a ground monitoring network of PM2.5 constituents has not been established. Fortunately, spaceborne sensors such as the Multi-angle Imaging SpectroRadiometer (MISR) instrument can provide complementary information on aerosol types. With the help of state-of-the-art machine learning techniques, we developed a sulfate prediction model under support from available ground measurements, MISR’s aerosol optical depth data, and other atmospheric reanalysis at a spatial resolution of 10 km. Compared with classical chemical transport models, our sulfate model performs better with a random cross-validation R2 of 0.9 at the monthly level. We found that the national mean population-weighted sulfate concentration was relatively stable before the Air Pollution Prevention and Control Action Plan was born in 2013, but dramatically decreased by 28.7% from 2013-2018. Correspondingly, the total non-accidental and cardiopulmonary deaths attributed to sulfate decreased by 40.7% and 42.3%, respectively. Our study’s reliable sulfate estimates will advance future studies on evaluating air quality policies and understanding the adverse health effect of particulate sulfate.

Vu et al.: Application of geostationary satellite and high-resolution meteorology data in estimating hourly PM2.5 levels during the Camp Fire episode in California

Vu, B.N., Bi, J., Wang, W., Huff, A., Kondragunta, S., Liu, Y. (2022). Application of geostationary satellite and high-resolution meteorology data in estimating hourly PM2.5 levels during the Camp Fire episode in California. Remote Sensing of Environment, 271, 112890.

Science Direct: Link

Particulate matter from wildland fire smoke can traverse hundreds of kilometers from where they originated and result in excess morbidity and mortality. However, estimating PM2.5 levels in and around wildland fires poses many challenges including lack of monitors at or near smoke plumes and temporally coverage since monitors generally obtain daily measurements yet smoke plumes continually change throughout the day due to wind. In this paper, we estimated total PM2.5 concentration during the Camp Fire episode, one of the deadliest wildland fires in California history. We used a random forest (RF) model to calibrate hourly regulatory monitor measurements (AQS maintained by the EPA) and low-cost sensors (PurpleAir) to Geostationary Operational Satellite-16 (GOES-16)’s AOD and fire spot detection variables, High-Resolution Rapid Refresh (HRRR) meteorology, and ancillary variables including land-use, distance to road, and a convolutional layer calculated for each hour. We tried three separate approaches: 1) an AQS-only model; 2) AQS + weighted PurpleAir model; and 3) AQS + weighted PurpleAir + SMOTE model. Synthetic Minority-Oversampling Technique (SMOTE) was applied to enhance and bolster the number of extremely high observations and improve model performance. The AQS-only model achieved and out of bag (OOB) R2 (RMSE) of 0.84 (12.00 μg/m3) and a spatial and temporal cross-validation (CV) R2 (RMSE) of 0.74 (16.28 μg/m3) and 0.73 (16.58 μg/m3), respectively. The AQS + weighted PurpleAir model’s OOB and spatial and temporal CV R2 (RMSE) was 0.86 (9.52 μg/m3), 0.75 (14.93 μg/m3), and 0.79 (11.89 μg/m3), respectively. The AQS + weighted PurpleAir + SMOTE model’s OOB and spatial and temporal CV R2 (RMSE) was 0.92 (10.44 μg/m3), 0.84 (12.36 μg/m3), and 0.85 (14.88 μg/m3), respectively. Results from this study indicate that increased measurements as well as the application of SMOTE not only improves model performance but also residual errors, and may aid in epidemiological studies investigating intense and acute exposure to wildland fire smoke.     


Density Scatter Plot
Density Scatter Plots of spatial and temporal CV from left to right: Top (spatial CVs for AQS-only, AQS + weighted PurpleAir, AQS + weighted PurpleAir + SMOTE); Bottom (temporal CVs for AQS-only, AQS + weighted PurpleAir, AQS + weighted PurpleAir + SMOTE)


Blowout Maps
Prediction Maps at noon on November 16th, 2018. Blowouts from left to right: AQS-only model, AQS + weighted PurpleAir, AQS + weight PurpleAir + SMOTE.


Comparison to true-color composite
Comparison of prediction maps to true-color composites from MODIS instruments.

Bi et al.: Combining Machine Learning and Numerical Simulation for High-Resolution PM2.5 Concentration Forecast

Forecasting ambient PM2.5 concentrations with spatiotemporal coverage is key to alerting decision makers of pollution episodes and preventing detrimental public exposure. In this study, we developed a PM2.5 forecast framework by combining the robust Random Forest algorithm with a publicly accessible global CTM forecast product, NASA’s Goddard Earth Observing System “Composition Forecasting” (GEOS-CF), providing spatiotemporally continuous PM2.5 concentration forecasts for the next 5 days at a 1 km spatial resolution. Our forecast experiment was conducted for a region in Central China including the populous and polluted Fenwei Plain. Our model showed satisfactory forecast performance and substantially reduced the large biases in GEOS-CF. Our proposed framework requires minimal computational resources compared to running CTMs at urban scales, enabling near-real-time PM2.5 forecast in resource-restricted environments. 

Read online:

Wang et al.: A machine learning model to estimate ground-level ozone concentrations in California using TROPOMI data and high-resolution meteorology

Wenhao Wang, Xiong Liu, Jianzhao Bi, Yang Liu, A machine learning model to estimate ground-level ozone concentrations in California using TROPOMI data and high-resolution meteorology, Environment International, Volume 158, 2022, 106917

Abstract: Estimating ground-level ozone concentrations is crucial to study the adverse health effects of ozone exposure and better understand the impacts of ground-level ozone on biodiversity and vegetation. However, few studies have attempted to use satellite retrieved ozone as an indicator given their low sensitivity in the boundary layer. Using the Troposphere Monitoring Instrument (TROPOMI)’s total ozone column together with the ozone profile information retrieved by the Ozone Monitoring Instrument (OMI), as TROPOMI ozone profile product has not been released, we developed a machine learning model to estimate daily maximum 8-hour average ground-level ozone concentration at 10 km spatial resolution in California. In addition to satellite parameters, we included meteorological fields from the High-Resolution Rapid Refresh (HRRR) system at 3 km resolution and land-use information as predictors. Our model achieved an overall 10-fold cross-validation (CV) R2 of 0.84 with root mean square error (RMSE) of 0.0059 ppm, indicating a good agreement between model predictions and observations. Model predictions showed that the suburb of Los Angeles Metropolitan area had the highest ozone levels, while the Bay Area and the Pacific coast had the lowest. High ozone levels are also seen in Southern California and along the east side of the Central Valley. TROPOMI data improved the estimate of extreme values when compared to a similar model without it. Our study demonstrates the feasibility and value of using TROPOMI data in the spatiotemporal characterization of ground-level ozone concentration.

Keywords: Surface ozone; TROPOMI; OMI; Ozone profile; HRRR; Random forest; Spatiotemporal distribution

Zhang et al.: A machine learning model to estimate ambient PM2.5 concentrations in industrialized highveld region of South Africa

Danlu Zhang, Linlin Du, Wenhao Wang, Qingyang Zhu, Jianzhao Bi, Noah Scovronick, Mogesh Naidoo, Rebecca M. Garland, Yang Liu. (2021). A machine learning model to estimate ambient PM2.5 concentrations in industrialized highveld region of South Africa. Remote Sensing of Environment, 266, 112713.

Elsevier: Link

Exposure to fine particulate matter (PM2.5) has been linked to a substantial disease burden globally, yet little has been done to estimate the population health risks of PM2.5 in South Africa due to the lack of high-resolution PM2.5 exposure estimates. We developed a random forest model to estimate daily PM2.5 concentrations at 1 km2 resolution in and around industrialized Gauteng Province, South Africa, by combining satellite aerosol optical depth (AOD), meteorology, land use, and socioeconomic data. We then compared PM2.5 concentrations in the study domain before and after the implementation of the new national air quality standards. Model-estimated PM2.5 levels successfully captured the temporal pattern recorded by ground observations. Spatially, the highest annual PM2.5 concentration appeared in central and northern Gauteng, including northern Johannesburg and the city of Tshwane. Since the 2016 changes in national PM2.5 standards, PM2.5 concentrations have decreased in most of our study region, although levels in Johannesburg and its surrounding areas have remained relatively constant.




Wang et al.: Satellite-based assessment of the long-term efficacy of PM2.5 pollution control policies across the Taiwan Strait

Evaluating the efficacy of air pollution control policies is an essential part of the decision-making process to develop new policies and improve existing measures.  In this analysis, we assessed the effects air pollution control policies in the Taiwan Strait Region from 2005 to 2018 using full-coverage, high-resolution PM2.5generated by a satellite-driven machine learning model. A ten-fold cross-validation for our prediction model showed an R2value of 0.89, demonstrating that these predictions can be used for policy evaluation. During the 14-year period, PM2.5levels in all areas of Fujian and Taiwan underwent a significant decrease. Separate regression models for policy evaluation in Taiwan and Fujian showed that all considered policies have mitigated PM2.5pollution to various degrees. The Clean Air Action Plans (CAAP) is the most effective control policy in Taiwan, while the Action Plan of Air Pollution Prevention and Control (APPC-AP) and Three-year Action Plan for Blue Skies (3YAP-BS) as well as their provincial implementation plans are the most successful in Fujian. The effectiveness of control policies, however, varies by land-use types especially for Taiwan.



Stowell et. al: Estimating PM2.5 in Southern California using satellite data: factors that affect model performance.

In the article, the authors focus on a region where traditional satellite AOD models have not performed as well compared to other areas of the US, in order to determine which region-specific parameters have the highest impact on model accuracy. Using a two-stage linear approach, the authors identified important meteorological and land use parameters including temperature, relative humidity, wind, road distance, distance to coast and other local factors. Of note, a variable representing the ratio of PM2.5 to PM10 particles was included to account for airborne dust. This parameter is not generally used in other regional analyses and was an important addition to the Southern California model, improving accuracy from R2 of 0.70 to 0.80. This study adds significantly to the current body of research in the region of the Southwestern US, suggesting that the influence of PM10 to AOD should be considered in areas with high concentrations of airborne dust.

Geng et al.: Random forest models for PM2.5 speciation using MISR data

Random forest models were developed to predict ground-level daily PM2.5 speciation concentrations in California from MISR fractional AODs and other supporting data such as ground measurements, chemical transport model simulations, land use variables and meteorological fields. Sensitivity tests were also conducted to explore the influence of variable selection on model performance. Results shows that fractional AODs and total AOD have similar predicting power in estimating PM2.5 species if there are sufficient ground measurements and predictor data to support the most sophisticated model structure. Otherwise, models using fractional AODs outperform those with total AOD. PM2.5 speciation concentrations are more sensitive to land use variables than other supporting data such as CTM simulations and meteorological information.

This article was published in Environmental Research Letters in March 2020.


She et al.: Hourly PM2.5 levels during heavy winter episodes in the Yangtze River Delta

In this publication, we quantitatively investigated the feasibility of using the aerosol optical depth (AOD) data retrieved by the Geostationary Ocean Color Imager (GOCI) to estimate hourly PM2.5 concentrations during winter haze episodes in the Yangtze River Delta (YRD). We developed a three-stage spatial statistical model, using GOCI AOD and fine mode fraction, as well as corresponding monitoring PM2.5 concentrations, meteorological and land use data on a 6-km modeling grid with complete coverage in time and space. The 10-fold cross-validation R2 was 0.72 with a regression slope of 1.01 between observed and predicted hourly PM2.5 concentrations. We further analyzed two representative large regional episodes, i.e., a “multi-process diffusion episode” during December 21–26, 2015 and a “Chinese New Year episode” during February 7–8, 2016. We concluded that AOD retrieved by geostationary satellites could serve as a new valuable data source for analyzing the heavy air pollution episodes.


Publication link