Raw dataset and R scripts for: Unravelling spatial drivers of topsoil total carbon variability in tropical paddy soils of Sri Lanka

Published: 20 December 2023| Version 3 | DOI: 10.17632/dnw2v82r8y.3
Contributors:
,
,
,
,
,
,

Description

This data set represents the raw dataset, raster files associated with the environmental covariates used for modelling, and the R script that describes the flow of analyses used for the research article entitled: Unravelling the spatial drivers of topsoil total carbon concentration variability in paddy-growing soils in tropical agro-ecosystems of Sri Lanka. This study specifically aimed at identifying the spatial drivers and estimates of total carbon (TC) concentration in topsoil (0-15 cm) across the paddy-growing regions in tropical climates using Sri Lanka as a case study. Two distinct sampling strategies were used to collect soil samples for model calibration and validation purposes. For model calibration, a total of 888 sampling locations were sampled using a conditioned Latin Hypercube sampling approach. Additionally, 99 sampling sites were selected using a design-based stratified random strategy for independent evaluation of the developed models. Total carbon concentration (%) was analysed using an automated dry combustion method via a 2400 Series II CHN Elemental Analyser. Geospatial modelling of TC concentration was carried out through two distinct random forest models using a variety of environmental covariates. The environmental covariates used for the current analyses includes; mean annual rainfall (Rainfal_N), annual average mean temperature (Temp_N), annual average minimum temperature (Temp_Min_N), annual average maximum temperature (Temp_Max_N), vapour pressure deficient (VPD_N), MODIS enhanced vegetation index (Modis_N), SAGA wetness index (SAGA_WI_N), slope angle (Slope_d_N) and elevation (DEM_N). All environmental covariates were resampled to a spatial resolution of 100 m prior to spatial analysis. Furthermore, we deployed a novel area of applicability (AOA) calculation to quantify and identify regions where the current prediction is less reliable. In addition to AOA analysis, the uncertainty of TC prediction (%) was calculated at a 90% prediction interval. The influence of increasing the number of calibration sites on model prediction quality and reliability was assessed by using a user-defined sequence of calibration sites (e.g. n=200, n=300, n=400, n=500, n=600, n=700, n=800, n=888). For more information on the study area, sampling design, analytical data generation, modelling, and interpretation of the data, please refer to the Research article mentioned above.

Files

Institutions

  • University of Sydney
  • Commonwealth Scientific and Industrial Research Organisation
  • National Institute of Fundamental Studies

Categories

Machine Learning, Spatial Modeling, Soil Carbon, Sri Lanka, Paddy Soil, Digital Soil Mapping

Funders

Licence