World Economic and Demographic Data

Published: 14 April 2026| Version 1 | DOI: 10.17632/nh5mjzj63d.1
Contributors:
Eric Wei, Weijie Pang

Description

The "World Economic and Demographic Data" database is a comprehensive, integrated data repository designed to gather data from economics, public health, demography, and social sciences. The database consolidates three major categories of country-level data – economic indicators, demographic statistics, and health metrics – spanning multiple years and countries worldwide. It consists of four interrelated tables that maintain the original data structure while enforcing referential integrity through foreign key constraints. The "World Economic and Demographic Data" database consists of four interrelated tables. The World Country Data table serves as the master reference, storing country codes, country names, and regional grouping information, with a self-referential foreign key to track historical changes in country codes or group affiliations. The Economic table captures annual economic indicators for each country, including GDP, GDP per capita, GDP growth, GNI, trade (imports and exports as percentages of GDP), government debt, and related metrics, with each record uniquely identified by the combination of country code and year. The Demographic Data table provides annual population statistics, such as total, male, and female populations (in thousands), population density, sex ratio, median age, rate of natural change, population change, and population growth rate, also keyed by country code and year. The Health Data table contains annual public health indicators, including fertility rate, life expectancy at birth, crude death and birth rates, net migration, current health expenditure as a percentage of GDP, hospital beds per 1,000 people, DPT immunization coverage, and HIV incidence, again uniquely identified by country code and year. Together, these four tables are linked through foreign key constraints on country code, ensuring referential integrity and enabling integrated cross-domain analysis across economics, demography, and public health. This database design enables researchers to conduct complex cross-sectional and longitudinal analyses, such as examining the relationship between GDP growth and life expectancy, analyzing how population density correlates with healthcare expenditure, or investigating the impact of trade policies on demographic transitions. The structured relational model ensures data integrity, eliminates redundancy, and supports efficient querying across all three data domains.

Files

Steps to reproduce

1. Data Source Identification and Collection The dataset was compiled from publicly available international databases, primarily sourced from the World Bank Open Data portal (data.worldbank.org), the United Nations Statistics Division, and the World Health Organization Global Health Observatory. Four distinct data files were downloaded as CSV (Comma-Separated Values) format based on the following entity definitions: Entity 1 (World Country Data): Country grouping information including group codes, group names, country codes, and historical country code changes. This file serves as the master reference table. Entity 2 (Economic Data): Annual economic indicators including GDP, GNI, trade statistics, government debt, and related growth metrics. Entity 3 (Demographic Data): Population statistics including total population, gender distribution, population density, median age, natural change rates, and population growth rates. Entity 4 (Health Data): Public health indicators including fertility rates, life expectancy, crude death and birth rates, net migration, health expenditure, hospital beds, immunization coverage, and HIV incidence. 2. Data Preprocessing In the original raw data, economic indicators are typically stored in a long format, where each row represents one country, one year, one indicator name, and its corresponding value. While this format is efficient for data storage and allows easy addition of new indicators, it results in multiple rows per country-year combination. The goal is to transform this into a wide format, where each indicator becomes its own column and each country-year combination occupies only one row. The overall workflow begins with loading the raw long-format data into Power Query, followed by selecting the indicator name column as the pivot column and the value column as the data source. After executing the pivot operation without aggregation, the resulting wide-format table is loaded back into Excel. This table is then saved as a CSV file and imported into the MySQL database. 3. Database Design and Schema Creation MySQL was selected as the relational database management system. The World Country Data table uses countrycode as the primary key, with countrycode_previous as a self-referential foreign key to track historical changes. The Economic, Demographic Data, and Health Data tables each use composite primary keys consisting of (Country Code, Year) to ensure each country has at most one record per year per table. Foreign key constraints were added to link Country Code in each data table to countrycode in the master table, with ON UPDATE CASCADE and ON DELETE RESTRICT actions. 4. Data Import Due to poor performance with MySQL Workbench's Table Data Import Wizard, the command-line approach using LOAD DATA LOCAL INFILE was adopted.

Institutions

Categories

Economics, Population, Demographics

Licence