Revealing the genesis of NO3-N pollution gradient in shallow groundwater of an intensely anthropogenic-impacted Basin by explainable machine learning
Description
This database includes two machine learning code files and one dataset file. The machine learning code files provide four different models: multiple linear regression, Bayesian ridge regression, random forest, and gradient boosting regression tree (‘Four machine learning models .ipynb’), as well as a feature importance evaluation based on the random forest model (‘Feature importance .ipynb’). The dataset in .csv format comprises 599 sampling points' data from the Shaying River Basin, covering groundwater quality, soil, meteorological, hydrogeological vulnerability, and land use data, with a size of 599*16.
Files
Steps to reproduce
The machine learning code files in this database can be opened using the Jupyter Notebook environment of Python 3.9. To run these codes, the necessary data must be converted to a CSV format and then replaced within the codes.