Revealing the genesis of NO3-N pollution gradient in shallow groundwater of an intensely anthropogenic-impacted Basin by explainable machine learning

Published: 3 May 2023| Version 1 | DOI: 10.17632/24dzpdwb6h.1
, Jichao Sun,


This database includes two machine learning code files and one dataset file. The machine learning code files provide four different models: multiple linear regression, Bayesian ridge regression, random forest, and gradient boosting regression tree (‘Four machine learning models .ipynb’), as well as a feature importance evaluation based on the random forest model (‘Feature importance .ipynb’). The dataset in .csv format comprises 599 sampling points' data from the Shaying River Basin, covering groundwater quality, soil, meteorological, hydrogeological vulnerability, and land use data, with a size of 599*16.


Steps to reproduce

The machine learning code files in this database can be opened using the Jupyter Notebook environment of Python 3.9. To run these codes, the necessary data must be converted to a CSV format and then replaced within the codes.


China University of Geosciences Beijing School of Water Resources and Environment


Machine Learning, Groundwater Contamination, Nitrate