Data and code for diffuse anthropogenic water-pollution assessment under climate and land-use change
Description
This dataset contains the processed data, core result tables, source-provenance metadata, and core analysis scripts supporting the study “Explainable machine learning for diffuse anthropogenic water-pollution assessment and adaptive watershed management under climate and land-use change”. The package supports reproducibility of the manuscript analyses, including harmonized nutrient station-year records, pesticide occurrence-screening data, environmental covariates, model evaluation outputs, interpretation results, health-screening outputs, and watershed-priority products. The workflow integrates water-quality monitoring records with climate, land-use, agrochemical, watershed, and monitoring-context information derived from open sources including Water Quality Portal, GEMStat, EEA Waterbase, GLORICH, CHIRPS, HYDE, and FAOSTAT. The uploaded archive includes cleaned/core analysis data, model-result tables, figure-supporting results, source-provenance documentation, and the core scripts required to reproduce the main analytical workflow. Large third-party raw source datasets are not redistributed in full; instead, source links, retrieval information, and processing metadata are provided so that users can trace the original data sources and understand the construction of the analysis-ready dataset. The dataset is intended for reproducing and checking the manuscript results, including temporal, spatial, and external validation of machine-learning models, feature-attribution analyses, uncertainty diagnostics, pesticide occurrence screening, and HUC8 watershed-priority outputs for adaptive monitoring and mitigation planning.
Files
Steps to reproduce
Download and extract the archive. The package contains processed analysis-ready data, core result tables, provenance metadata, and scripts used to reproduce the manuscript analyses. Run the scripts in numerical order from the code directory after installing the required Python packages listed in the package documentation. The workflow reproduces the harmonized nutrient and pesticide analysis tables, model-evaluation outputs, feature-attribution summaries, uncertainty diagnostics, HUC8 priority products, and figure-supporting result files. Large third-party raw source datasets are not redistributed; source links and retrieval metadata are provided in the provenance files.
Institutions
- Wuhan University of Science and TechnologyHubei, Wuhan