1980-2024 Europe SDG datasets and 2019-2030 DEA-based scores
Description
SDGTS_DB_NUTS_raw.zip (1980-2024 Europe SDG raw dataset): Time series, with missing data where not available at the Eurostat source, of statistical indicators available for the Sustainable Development Goals (SDG) in Europe at three NUTS levels of geographical distribution---regions (NUTS2), supra-regions (NUTS1), and countries (NUTS0)---across EU member states, candidate countries, and EFTA members from 1980 to 2024. SDGTS_DB_NUTS_nona.zip (2019-2024 Europe SDG complete dataset): Time series, with no missing data after imputation, suitable for DEA-based analysis, of statistical indicators for the Sustainable Development Goals (SDG) in Europe at the NUTS2 regional level of geographical distribution across EU member states, candidate countries, and EFTA members from 2019 to 2024. SDGTS_DB_NUTS_nona_alt.zip (2019-2024 Europe SDG complete dataset): Time series, with no missing data after imputation, suitable for non-DEA-based analysis, of statistical indicators for the Sustainable Development Goals (SDG) in Europe at the NUTS2 regional level of geographical distribution across EU member states, candidate countries, and EFTA members from 2019 to 2024. README file: outline of the structure of each dataset, naming conventions, file encoding formats, and definitions and units of all SDG indicators used. TableA2-SDG_scores_over_time.pdf (2019-2030 Europe SDG scores): DEA-based SDG scores computed as described in Fernández-Macho (2025), DEA-based impact assessment and forecasting: The case of SDG compliance in Europe after Covid-19, doi:10.21203/rs.3.rs-6614879/v1.
Files
Steps to reproduce
The raw dataset (SDGTS_DB_NUTS_raw) compiles time series of SDG indicators across EU member states, candidate countries, and EFTA members from 1980 to 2024. To get from the actual Eurostat sources to our raw dataset, some rearrangement of multidimensional code names and cleaning of alphanumerical flags was carried out. As units of measurement, relative units, typically existing percentages or rates, were selected or otherwise calculated. To support multivariate analyses that require the absence of missing data, extensive filtering and imputation were performed to arrive to the SDG complete time-series dataset from 2019 to 2024. Missing values were addressed through temporal and geographical imputation, followed by either a conservative (SDGTS_DB_NUTS_nona) or a neutral (SDGTS_DB_NUTS_nona_alt) final imputation to ensure dataset completeness. Time series of SDG scores for the period 2019 thru 2024 (TableA2-SDG_scores_over_time) for European NUTS2 regions were calculated using a DEA-based method. For forecasting purposes, the SDG index scores from all NUTS2 regions were pooled together into a single panel dataset, grouped by their respective countries to account for nested data structures, and lagged values were included to capture temporal dynamics. This AR(1) panel data model was estimated using a linear mixed effects framework, where fixed effects capture general temporal trends across regions and random effects account for country-level heterogeneity. Estimation was conducted via Restricted Maximum Likelihood (REML), which provides unbiased estimates of variance components under the mixed model specification. Forecasts were then computed until 2030 using the estimated parameters from the dynamic panel model.