Dataset for 'Satellite based Budyko framework reveals the human imprint on long-term surface water partitioning across India'

Published: 22-02-2021| Version 1 | DOI: 10.17632/w84gwxgbtm.1
Anav Vora,


The folder 'Fitting_Budyko_Parameter' contains codes and data for obtaining calibrated values of Tixeront- Fu's omega from long-term mean annual precipitation, potential evapotranspiration, and actual evapotranspiration. Codes for performing a split sample test to check the performance of the omega calibration are also in the same folder. Data for the remaining portion of the analysis is extracted from 'ProcessedData.xlsx'. The file contains all physio-climatic and socio-economic characteristics used for analysis. 'CorrelationAnalysis.r' is used to generate correlograms for identifying correlated characteristics. Intra-category correlations are identified from generated correlograms to shortlist characteristics. Further correlations are eliminated by examining a combined correlogram having factors from all categories. 'Main_Code.r' calls the functions 'CART_func.r' and 'RF_func.r' iteratively for each omega threshold defined in 'CART_RF_Model_Thresholds.xlsx'. Pruned and unpruned trees for class prediction are extracted as outputs from 'CART_func.r'. Pruned trees are used to develop a circos plot using the online tool on ‘’. Pruning is done using the 1-SE rule (Breiman et al., 1984) Factor predictive importance computed using permutation of variable importance quantification is obtained from 'RF_func.r'. The function also generates confusion matrices indicating the categorization performance of each trained random forest model. Outputs are written to the 'Outputs' folder in appropriately named sub-folders