Air Quality Class Prediction Using Machine Learning Methods Based on Monitoring Data and Secondary Modeling

Air Quality Class Prediction Using Machine Learning Methods Based on Monitoring Data and Secondary Modeling

2024 | Qian Liu, Bingyan Cui, Zhen Liu
This study introduces a novel air quality prediction methodology that combines machine learning and enhanced secondary data modeling. The research addresses the limitations of traditional primary Air Quality Index (AQI) forecasting models and the underutilization of meteorological data. The dataset used includes forecasted primary pollutant concentrations, primary meteorological conditions, actual meteorological observations, and pollutant concentration measurements from monitoring stations in Jinan, China, spanning from July 23, 2020, to July 13, 2021. The study begins with a rigorous correlation analysis to select ten meteorological factors, which are then assessed and ranked based on their impact on different pollutant concentrations using univariate and multivariate significance analyses and a random forest approach. Seasonal characteristic analysis highlights the distinct seasonal impacts of temperature, humidity, air pressure, and general atmospheric conditions on six key air pollutants. The performance evaluation of various machine learning-based classification prediction models reveals that the Light Gradient Boosting Machine (LightGBM) classifier is the most effective, achieving an accuracy rate of 97.5% and an F1 score of 93.3%. For AQI prediction, the Long Short-Term Memory (LSTM) model outperforms other models, demonstrating a goodness-of-fit of 91.37% for AQI predictions, 90.46% for O₃ predictions, and a perfect fit for the primary pollutant test set. The findings confirm the reliability and efficacy of the employed machine learning models in air quality forecasting, highlighting the importance of incorporating seasonal dynamics and using advanced machine learning techniques to enhance prediction accuracy.This study introduces a novel air quality prediction methodology that combines machine learning and enhanced secondary data modeling. The research addresses the limitations of traditional primary Air Quality Index (AQI) forecasting models and the underutilization of meteorological data. The dataset used includes forecasted primary pollutant concentrations, primary meteorological conditions, actual meteorological observations, and pollutant concentration measurements from monitoring stations in Jinan, China, spanning from July 23, 2020, to July 13, 2021. The study begins with a rigorous correlation analysis to select ten meteorological factors, which are then assessed and ranked based on their impact on different pollutant concentrations using univariate and multivariate significance analyses and a random forest approach. Seasonal characteristic analysis highlights the distinct seasonal impacts of temperature, humidity, air pressure, and general atmospheric conditions on six key air pollutants. The performance evaluation of various machine learning-based classification prediction models reveals that the Light Gradient Boosting Machine (LightGBM) classifier is the most effective, achieving an accuracy rate of 97.5% and an F1 score of 93.3%. For AQI prediction, the Long Short-Term Memory (LSTM) model outperforms other models, demonstrating a goodness-of-fit of 91.37% for AQI predictions, 90.46% for O₃ predictions, and a perfect fit for the primary pollutant test set. The findings confirm the reliability and efficacy of the employed machine learning models in air quality forecasting, highlighting the importance of incorporating seasonal dynamics and using advanced machine learning techniques to enhance prediction accuracy.
Reach us at info@study.space
Understanding Air Quality Class Prediction Using Machine Learning Methods Based on Monitoring Data and Secondary Modeling