Environmental Digital Governance Index for Chinese Cities, 2015–2023
Description
This dataset provides a comprehensive panel dataset for investigating the impact of environmental digital governance (EDG) on urban carbon emission intensity (CEI) across Chinese prefecture-level cities from 2015 to 2023. Core Variables and Measurement The primary explanatory variable is the Environmental Digital Governance Index (EDG). It is constructed from 77,514 government procurement contracts (total value ~RMB 106.2 billion) identified via a three-stage text-analysis pipeline: (i) dual-domain keyword expansion covering digital technology and environmental governance; (ii) large language model (KIMI) semantic annotation on 20,000 randomly sampled contracts; and (iii) LSTM-RoBERTa deep learning classification (accuracy: 92.62%; F1: 0.9270) applied to the full sample. Contract values are aggregated at the city-year level and normalized by regional GDP to measure governance intensity, with cumulative investment used to account for the long service life of digital infrastructure. The dependent variable, Carbon Emission Intensity (CEI), is derived from the EDGAR v2025 global emission inventory (European Commission JRC / PBL). Annual gridded CO₂ emissions at ~10×10 km resolution are spatially aggregated to prefecture-level boundaries via GIS zonal statistics and normalized by city-level real GDP. Sample and Scope The dataset covers an unbalanced panel of 2,619 city-year observations spanning 291 prefecture-level cities, excluding municipalities directly under central government and cities with boundary adjustments or missing data. Control Variables The dataset includes extensive city-characteristic controls: per capita GDP, urbanization rate, land area, industrial structure upgrading (tertiary/secondary ratio), financial development, trade openness, and green innovation (green patent applications). Government-behavior controls include fiscal expenditure/GDP, R&D expenditure ratio, environmental regulation intensity (text-mined from government work reports), and policy dummies for carbon emission trading pilots and energy conservation fiscal policy pilots. Files Included The repository contains the full panel dataset in Stata format (.dta) alongside replication code for baseline regressions, endogeneity tests (IV, PSM, DML, DML-IV), mechanism analyses, and heterogeneity analyses reported in the associated manuscript. Potential Applications This dataset is suitable for research on environmental economics, digital governance, climate policy evaluation, and urban sustainability. Researchers may use it to analyze the carbon-reduction effects of digital regulatory tools, examine heterogeneous policy impacts across technological hierarchies and institutional contexts, or serve as a benchmark for alternative measures of government digital transformation.
Files
Institutions
- Jinan UniversityGuangdong, Guangzhou
- Shantou UniversityGuangdong, Shantou