Abstract
This study compares two machine learning models, a Random Forest (RF) and a spatial Graph Neural Network (GNN), for predicting nitrogen dioxide (NO) concentrations across diverse urban conditions in Berlin, Germany. Therefore, both models use information on local land-use characteristics, meteorological conditions, and seasonal greenery, which enables a post-hoc analysis of high-concentration scenarios under varying environmental factors. Unlike most previous approaches to air-pollution estimation, this study explicitly considers the interaction between urban greenery and its seasonal variation. The analysis is based on a self-curated, high-resolution site-level environmental dataset that captures hourly NO
observations from sixteen monitoring stations across Berlin in 2023 with detailed land-use, traffic, and architectural data obtained from the Berlin Geoportal. This dataset is supplemented with multiple meteorological records from the Deutscher Wetterdienst (DWD). While both models achieve comparable accuracy (R
0.6), the GNN shows a tendency toward less variation of predictive accuracy across test sites, suggesting potential spatial robustness. For explainability, only the RF model allows for local interpretability via Shapley values, which indicate that urban greenery helps mitigate NO
levels depending on seasonal changes in leaf area. However, additional statistical testing does not support this observed trend. Beyond the conducted assessment, this research contributes a comprehensive environmental dataset that links air quality, land-use, and meteorological variables at hourly resolution. This resource supports future investigations into how environmental and spatial factors jointly influence pollutant dispersion and decomposition in urban environments.
observations from sixteen monitoring stations across Berlin in 2023 with detailed land-use, traffic, and architectural data obtained from the Berlin Geoportal. This dataset is supplemented with multiple meteorological records from the Deutscher Wetterdienst (DWD). While both models achieve comparable accuracy (R
0.6), the GNN shows a tendency toward less variation of predictive accuracy across test sites, suggesting potential spatial robustness. For explainability, only the RF model allows for local interpretability via Shapley values, which indicate that urban greenery helps mitigate NO
levels depending on seasonal changes in leaf area. However, additional statistical testing does not support this observed trend. Beyond the conducted assessment, this research contributes a comprehensive environmental dataset that links air quality, land-use, and meteorological variables at hourly resolution. This resource supports future investigations into how environmental and spatial factors jointly influence pollutant dispersion and decomposition in urban environments.
| Original language | English |
|---|---|
| Article number | 103568 |
| Journal | Ecological Informatics |
| Volume | 93 |
| DOIs | |
| Publication status | Published - Feb 2026 |
| MoE publication type | A1 Journal article-refereed |
Keywords
- Air pollution mitigation
- Graph Neural Networks
- Leaf Area Index
- Nitrogen dioxide prediction
- Random Forest
- Spatial air quality interpolation