Water quality prediction based on sparse dataset using enhanced machine learning

Water quality prediction based on sparse dataset using enhanced machine learning

Accepted 19 February 2024 | Sheng Huang, Jun Xia, Yueling Wang, Jiarui Lei, Gangsheng Wang
This study explores the use of machine learning, specifically Long Short-Term Memory (LSTM) models, to predict water quality in a river-lake confluence with sparse and partially missing data. The research combines LSTM models with the Load Estimator (LOADEST) to enhance the accuracy of pollution load predictions. The Self-Attentive LSTM (SA-LSTM) model, in particular, outperforms other LSTM models and traditional Recurrent Neural Network (RNN) in predicting water quality parameters such as CODMn and NH3N. The SA-LSTM-LOADEST model achieved Nash-Sutcliffe Efficiency (NSE) scores of 0.71 for CODMn and 0.57 for NH3N, reducing Root Mean Square Error (RMSE) by 24.6% and 21.3%, respectively, compared to the standalone SA-LSTM model. The model's performance remains robust even when data collection intervals are extended from weekly to monthly, and it can forecast pollution loads up to ten days in advance. This study demonstrates the potential of combining LSTM and LOADEST for improving water quality modeling in regions with limited monitoring capabilities.This study explores the use of machine learning, specifically Long Short-Term Memory (LSTM) models, to predict water quality in a river-lake confluence with sparse and partially missing data. The research combines LSTM models with the Load Estimator (LOADEST) to enhance the accuracy of pollution load predictions. The Self-Attentive LSTM (SA-LSTM) model, in particular, outperforms other LSTM models and traditional Recurrent Neural Network (RNN) in predicting water quality parameters such as CODMn and NH3N. The SA-LSTM-LOADEST model achieved Nash-Sutcliffe Efficiency (NSE) scores of 0.71 for CODMn and 0.57 for NH3N, reducing Root Mean Square Error (RMSE) by 24.6% and 21.3%, respectively, compared to the standalone SA-LSTM model. The model's performance remains robust even when data collection intervals are extended from weekly to monthly, and it can forecast pollution loads up to ten days in advance. This study demonstrates the potential of combining LSTM and LOADEST for improving water quality modeling in regions with limited monitoring capabilities.
Reach us at info@study.space