Karachi Environmental Datasets (2015–2024): Weather, Soil, and Urban Tree Species Traits
Description
This dataset provides three integrated environmental data resources for Karachi, Pakistan, developed to support urban climate research, soil suitability assessment, and climate-resilient urban forestry planning. The first dataset contains monthly aggregated meteorological records for 100 georeferenced locations across Karachi from January 2015 to December 2024. Variables include mean air temperature (°C), relative humidity (%), and wind speed (km/h). Raw hourly or daily weather data were retrieved via a publicly accessible API and aggregated to monthly averages. The dataset enables long-term climate trend analysis, seasonal variability studies, and spatial comparisons across urban zones. The second dataset consists of soil properties for 104 georeferenced locations extracted from the SoilGrids v2 API (0–5 cm depth). Variables include organic carbon density (kg/m³), pH, clay (%), sand (%), and bulk density (g/cm³). SoilGrids raw values were converted into standard scientific units using documented scaling factors. Locations were grouped into 20 spatial clusters representing broader soil zones, and six dominant soil-type patterns were identified across Karachi. Soil predictions were compared with published field observations (Naz et al., 2019), showing strong agreement in pH gradients, texture distributions, and organic matter ranges, with an estimated overall reliability of ~87%. The third dataset is a structured trait database of 30 commonly planted urban tree species in Karachi, containing 49 ecological, morphological, and environmental tolerance attributes. Traits include growth rate, canopy development, soil and salinity tolerance, climatic thresholds, drought resistance, pollution tolerance, and urban heat resilience. Data were compiled from horticultural literature, FAO EcoCrop database, municipal records, and field observations, and validated by botanical experts. Together, these datasets provide an integrated foundation for urban climate modeling, soil suitability assessment, species selection analysis, machine learning applications, and green infrastructure planning in arid coastal megacities.
Files
Steps to reproduce
This dataset was compiled using a structured multi-source workflow: Weather Dataset: Meteorological data were retrieved via the Open-Meteo historical weather API for 100 predefined latitude–longitude coordinates across Karachi. Hourly or daily observations were programmatically extracted and aggregated into monthly means for the period January 2015 to December 2024. Aggregation was performed using arithmetic averaging. Minor missing daily values (<10% per month) were filled using linear interpolation. All timestamps were standardized to Pakistan Standard Time (UTC+5). Final outputs were exported as CSV. Soil Dataset: Soil properties were extracted using the SoilGrids v2 REST API (ISRIC – World Soil Information) for 104 georeferenced sampling points. Values were retrieved for the 0–5 cm depth layer. SoilGrids provides scaled outputs; therefore, documented unit conversions were applied (e.g., pH divided by 10, clay and sand converted from g/kg to %, bulk density from cg/cm³ to g/cm³, organic carbon density from hg/m³ to kg/m³). Sampling locations were grouped into 20 spatial clusters based on geographic proximity to represent broader soil zones. Cluster-level summaries were used to characterize dominant soil types. To assess reliability, SoilGrids-derived values were compared with field-measured soil characteristics reported by Naz et al. (2019), who sampled 30 sites across six habitat types around Karachi. Organic matter percentage was approximated from organic carbon density using a standard conversion factor. Agreement was evaluated using normalized mean error calculations. Tree Species Trait Dataset: Trait information for 30 commonly planted urban tree species was compiled from FAO EcoCrop database, horticultural references, municipal plantation records, and field observations. Traits were standardized into binary and numerical formats for machine-readability. Growth rate was encoded on a 1–3 scale (slow to fast). All values were harmonized and reviewed by botanical experts for ecological consistency. All datasets were structured as CSV files and uploaded to Mendeley Data with DOI assignment.
Institutions
- National University of Computer and Emerging SciencesIslamabad, Islamabad