Water quality prediction based on sparse dataset using enhanced machine learning

Water quality prediction based on sparse dataset using enhanced machine learning

2024 | Sheng Huang, Jun Xia, Yueling Wang, Jiarui Lei, Gangsheng Wang
This study presents a novel approach to predict water quality using sparse datasets with enhanced machine learning. The research focuses on the river-lake confluence of Dongting Lake and the Yangtze River, where hydrological patterns are complex. The study evaluates the effectiveness of traditional Recurrent Neural Networks (RNNs) and three Long Short-Term Memory (LSTM) models, integrated with the Load Estimator (LOADEST), for predicting water quality parameters such as chemical oxygen demand (CODMn) and ammonia nitrogen (NH3N). The Self-Attentive LSTM (SA-LSTM) model, combined with LOADEST, outperformed other models, achieving Nash-Sutcliffe Efficiency (NSE) scores of 0.71 for CODMn and 0.57 for NH3N. The SA-LSTM-LOADEST model reduced Root Mean Square Error (RMSE) by 24.6% for CODMn and 21.3% for NH3N compared to the standalone SA-LSTM model. The model maintained accuracy even when data collection intervals were extended from weekly to monthly. Additionally, the model could forecast pollution loads up to ten days in advance. The study highlights the potential of machine learning in improving water quality modeling in regions with limited monitoring capabilities. The SA-LSTM-LOADEST model demonstrated effectiveness in handling sparse data and provided a new approach for water quality prediction in complex river-lake systems. The study also discusses the importance of incorporating environmental factors and improving water quality monitoring to enhance model accuracy. The results indicate that the SA-LSTM-LOADEST model is a promising tool for pollution load modeling with sparse data, particularly in river-lake confluences.This study presents a novel approach to predict water quality using sparse datasets with enhanced machine learning. The research focuses on the river-lake confluence of Dongting Lake and the Yangtze River, where hydrological patterns are complex. The study evaluates the effectiveness of traditional Recurrent Neural Networks (RNNs) and three Long Short-Term Memory (LSTM) models, integrated with the Load Estimator (LOADEST), for predicting water quality parameters such as chemical oxygen demand (CODMn) and ammonia nitrogen (NH3N). The Self-Attentive LSTM (SA-LSTM) model, combined with LOADEST, outperformed other models, achieving Nash-Sutcliffe Efficiency (NSE) scores of 0.71 for CODMn and 0.57 for NH3N. The SA-LSTM-LOADEST model reduced Root Mean Square Error (RMSE) by 24.6% for CODMn and 21.3% for NH3N compared to the standalone SA-LSTM model. The model maintained accuracy even when data collection intervals were extended from weekly to monthly. Additionally, the model could forecast pollution loads up to ten days in advance. The study highlights the potential of machine learning in improving water quality modeling in regions with limited monitoring capabilities. The SA-LSTM-LOADEST model demonstrated effectiveness in handling sparse data and provided a new approach for water quality prediction in complex river-lake systems. The study also discusses the importance of incorporating environmental factors and improving water quality monitoring to enhance model accuracy. The results indicate that the SA-LSTM-LOADEST model is a promising tool for pollution load modeling with sparse data, particularly in river-lake confluences.
Reach us at info@study.space