A 50-Year Relational Time-Series of Economic and Demographic Development of 217 World Bank economies (1976-2025): Macroeconomic, Demographic, and Other Developmental Indicators

Published: 14 April 2026| Version 1 | DOI: 10.17632/244fjjds98.1
Contributors:
Hank Pham,

Description

The dataset is a relational, analytics-oriented panel designed to support research on global development patterns across 217 World Bank economies from 1976 through 2025. It consolidates wide-range of indicators from the World Bank and the United Nations into a consistent country-year structure, making it suitable for econometric modeling, trend analysis, policy benchmarking, and cross-domain feature engineering. Most indicator tables use a composite key of Country_ISO3_Code + Observation_Year, which enables clean joins across domains without reshaping source files each time. Core indicator domains include: - Macroeconomics: GDP, GDP per capita PPP, inflation, unemployment - Demographics: population, life expectancy, age distribution, urbanization - Investment & Infrastructure: capital formation, electricity and internet access, broadband/mobile penetration, energy use, transport proxies - Foreign Investment & External Finance: FDI inflows/outflows, remittances, ODA, external debt - Education Investment & Outcomes: education spending, completion/literacy metrics, enrollment levels, and gender parity indicators To support interpretability and governance, the dataset separates descriptive reference entities from measurements via dimension-style tables such as continent, lending category, and income group. A dedicated historical classification table tracks how each economy’s income group changes over time, allowing analysts to study structural transitions (for example, low-income to middle-income movement) while preserving period context. Although the time window is broad and standardized, users should treat it as a bounded panel rather than a perfectly complete matrix: some countries or indicators may have missing values in specific years depending on source availability. Overall, the dataset is built to be SQL-friendly, reproducible, and directly usable for downstream BI dashboards, forecasting workflows, and academic-grade comparative analysis. Overview of the included files: - The “/data” folder includes original datasets. - The “/Final_Dataset_20260411_144349” folder contains processed datasets. - “transform.py” convert raw datasets into clean, relational datasets. - “ER_Diagram.png” shows an overview and relationships between entities. - “create_tables.sql” and “ER_model.mwb” are used to model the final data structures in mySQL Workbench.

Files

Steps to reproduce

To accurately reproduce this database, researchers must extract four raw datasets from the following portals: 1. World Bank: World Development Indicators (WDI) - Link: https://databank.worldbank.org/source/world-development-indicators - Data to Collect: Select all available countries. Under "Series", check the 36 specific indicators spanning Macroeconomics, Demographics, Infrastructure, Foreign Investment, and Education (e.g., GDP current US$, FDI net inflows). Under "Time", check the "50 Years" box to capture the historic window. Export the selection as a CSV. 2. World Bank: Historical Classifications (OGHIST) - Link: https://datahelpdesk.worldbank.org/knowledgebase/articles/906519-world-bank-country-and-lending-groups - Data to Collect: Click "historical classification by income" to download the OGHIST Excel file. The required data is located strictly on the "Country Analytical History" sheet, which tracks the annual income transitions (L, LM, UM, H) required for the relational classifications. 3. World Bank: Current Classifications (CLASS) - Link: (Same link as above) - Data to Collect: Click "current classification by income" to download the CLASS file. Extract the static Region and Lending Category classifications from the "List of economies" sheet to build the base dimension tables. 4. UN: World Population Prospects (WPP) - Link: https://population.un.org/wpp/Download/Standard/CSV/ - Data to Collect: Locate the "Demographic Indicators" section and download the "Medium Variant" dataset as a CSV. This file provides the absolute population headcounts and life expectancy metrics. Place the four raw CSV files Dim_Country_CLASS_2025.csv, OGHIST_2026_03_10.csv, Fact_Demographics_WPP2024_Demographic_Indicators_Medium.csv, and World_Bank_Bulk_Data.csv) into the project's "/data" folder. Note that there are XLSX sheets must be exported to CSV and files names must be changed as mentioned. Execute the provided Python pipeline script - transform.py. The deterministic pipeline automates the following steps: - Standardizes data formats and builds dimension tables (Continents, Lending Categories, Income Groups) with integer IDs. - Reshapes the World Bank indicator and historical income matrices into normalized formats. - Merges UN demographic headcounts with World Bank age/urban metrics. - Enforces a strict temporal/geographic scope across all data, applies SQL-compliant data types, and exports 10 finalized, relational CSVs into a timestamped directory aligned with the database schema. Tools used: Python 3, pandas, NumPy, file-system utilities (os, datetime, pathlib-style workflow), and mySQL Workbench for relational schema definition and loading targets.

Institutions

Categories

Economy, Education, Macroeconomics, Human Development

Licence