Cross-National Panel of Economic, Demographic, and Investment Indicators: Normalized Tables Derived from World Bank WDI and UN World Population Prospects (DATA 6200, Spring 2026)

Published: 14 April 2026| Version 1 | DOI: 10.17632/ts57mnpv78.1
Contributors:
Andrew Zalev,

Description

This deposit provides a normalized, relational representation of country-level economic, demographic, and investment-related measures suitable for teaching and research in data management, longitudinal analysis, and international development. The database integrates two authoritative public sources. Macroeconomic and investment-style series are derived from the World Bank’s World Development Indicators (WDI) and related World Bank country metadata, which supply comparable cross-country time series with documented definitions and units. Demographic series are derived from the United Nations Department of Economic and Social Affairs, Population Division, World Population Prospects (WPP), which provides internationally harmonized demographic estimates and projections by location and time period. By separating reference data (countries, indicators, and extract provenance) from long-format observations, the schema avoids redundant storage of labels and units, supports referential integrity, and makes it straightforward to append new years or indicators without restructuring tables. The normalized logical model emphasizes third normal form (3NF): each non-key attribute depends only on the key of its table, and repeating groups (for example, one column per year in a spreadsheet) are unpivoted into atomic rows keyed by country, indicator, and year. A single fact-style observation table stores numeric values with foreign keys to country and indicator dictionaries, while a data extract table records when files were downloaded and from which URLs, supporting reproducibility and auditability. Optional views can present wide “panel” layouts (one row per country-year) for common indicator bundles such as GDP, inflation, unemployment, and current account balance, alongside demographic measures such as total population, population growth, median age, and life expectancy, and investment-related measures such as government education expenditure, current health expenditure, foreign direct investment, and gross fixed capital formation. The dataset is intended for coursework scenarios including SQL querying, integrity constraint design, ETL from CSV, and documentation of data lineage. Users should verify indicator definitions against the original provider documentation before drawing substantive scientific conclusions, because revisions to official series and boundary changes can affect historical values. All intellectual property rights in the underlying statistics remain with their respective publishers; this deposit documents the transformation and organization applied to create a reproducible, normalized derivative suited to database systems coursework.

Files

Steps to reproduce

Scope and design decisions: Fix the analytical grain to one row = one country, one indicator, and one calendar year (or aligned WPP reference year). Use ISO 3166-1 alpha-3 codes as the primary country key for World Bank data. For UN WPP, obtain the official location identifier and build a mapping table from WPP location ID/name to ISO3, resolving ambiguities manually or via published correspondence tables. World Bank country reference (entity: Countries). Download country metadata including code, name, region, income group, lending type, and capital city via World Bank Data API (JSON) or bulk CSV: Load into a 'country' table with primary key on 'country_code' (CHAR(3)). Include attributes: 'country_name', 'region', 'income_group', 'lending_type', 'capital_city'. Normalize text fields and enforce code uniqueness. World Bank indicators (entities: Economic and Investment indicators). Download long-format CSVs or use the API for key indicators: GDP (current US$) NY.GDP.MKTP.CD; inflation (annual %) FP.CPI.TOTL.ZG; unemployment (% of labor force) SL.UEM.TOTL.ZS; current account balance BN.CAB.XOKA.CD; government education expenditure SE.XPD.TOTL.GD.ZS; current health expenditure SH.XPD.CHEX.GD.ZS; foreign direct investment BX.KLT.DINV.CD.WD; gross fixed capital formation NE.GDI.TOTL.CD. Create an 'indicator' dictionary table keyed by 'indicator_code' with 'indicator_name' and 'unit_or_note'. Track each download in 'data_extract' with 'source_url', 'download_date', and 'source_label'. Unpivot years into series_observation ('country_code', 'indicator_code', 'year', 'value', 'extract_id'). Use composite primary key on ('country_code', 'indicator_code', 'year') and foreign keys to 'country' and 'indicator'. UN WPP demographics (entity: Demographic indicators). Download WPP CSVs for total population, growth rate, median age, and life expectancy. Standardize location keys and years. Store as additional rows in series_observation using synthetic codes (e.g., UN_WPP_POP_TOTAL) or in a parallel 'un_wpp_observation' table. Maintain consistent 'indicator' definitions and provenance in 'data_extract'. Quality checks: Remove/flag duplicate keys, validate year ranges, and spot-check values for selected countries against original web tables. Ensure referential integrity: every observation links to valid 'country', 'indicator', and 'extract_id'. Tools: Web browser; World Bank Data portal/API; UN WPP downloads; Python (pandas) or R (tidyverse) for processing; MySQL Workbench or PostgreSQL for schema and views; Git (optional) for scripts; spreadsheet for spot checks only.

Institutions

Categories

Economics, Demography

Licence