Global Economic and Demographic Indicators Dataset (1974-2023): A 50-Country Normalized SQL Database
Description
This dataset presents a comprehensive, fully normalized relational database capturing critical economic, investment, and demographic indicators for 50 diverse countries spanning a 50-year period from 1974 to 2023. Designed specifically for relational database management systems (RDBMS), the dataset is structured in Third Normal Form (3NF) to eliminate data redundancy and ensure referential integrity. Instead of raw flat files, this dataset is provided as a complete, self-contained SQL dump (.sql file) that includes the Data Definition Language (DDL) to create the schema and the Data Manipulation Language (DML) to populate the records. The database is divided into four interconnected tables: a central 'Country' parent table, and three child tables ('Economic_Record', 'Investment_Record', and 'Demographic_Record'). The Country table establishes the primary entity with unique identifiers (Country_ID), ISO Alpha-3 codes, continent classifications, and development statuses (Developed vs. Developing). The Economic_Record table details annual Gross Domestic Product (GDP in current USD), GDP per capita, inflation rates (consumer prices), and unemployment rates. The Investment_Record table tracks Foreign Direct Investment (FDI) net inflows, gross capital formation (total investment as a percentage of GDP), and gross national savings. Finally, the Demographic_Record table incorporates United Nations data on total population, life expectancy at birth, and annual population growth rates. By synthesizing authoritative data from the World Bank and the UN, this dataset offers an integrated resource for researchers and students looking to perform complex SQL querying and cross-country comparisons using clean, relational data architectures.
Files
Steps to reproduce
To reproduce this dataset or rebuild the database architecture from the original raw sources, researchers must follow a strict data pipeline involving data collection, transformation, and normalization into a Third Normal Form (3NF) schema. This specific repository utilizes a self-contained .sql dump file to streamline the rebuilding process. Data Collection: World Bank Data: Navigate to the World Bank DataBank (World Development Indicators). Select the 50 target countries and an exact 50-year time frame (1974-2023). Extract the following series: GDP (current US$), GDP per capita, Inflation consumer prices (annual %), Unemployment total (%), Foreign direct investment net inflows, Gross capital formation (% of GDP), and Gross savings (% of GDP). Download as a CSV. United Nations Data: Navigate to the UN World Population Prospects portal. Download the Demographic Indicators CSV. Extract Total Population, Population Growth Rate, and Life Expectancy at Birth. Data Processing and Transformation (Wide to Long): The raw datasets are initially structured in a 'Wide' format (years as individual columns). To prepare this for a relational database, the data must be unpivoted into a 'Long' format using Python (pandas) or Excel Power Query. All year columns are collapsed into a single 'Record_Year' attribute. Database Design and Normalization Workflow: Parent Table Creation: Extract the 50 unique country names and their ISO Alpha-3 codes. Assign an auto-incrementing integer Primary Key (Country_ID). Append manual geographical ('Continent') and economic classifications ('Development_Status'). Foreign Key Mapping: In the World Bank and UN datasets, completely remove the string-based 'Country Name' columns. Perform a VLOOKUP or use pandas mapping to replace the text names with the corresponding numeric 'Country_ID' from the parent table to satisfy normalization rules. SQL Generation: Once the data is normalized, table schemas are written in SQL using CREATE TABLE statements with appropriate primary and foreign key constraints. The cleaned data is then inserted using INSERT INTO statements. Rebuilding using the SQL file: Researchers downloading this repository do not need to repeat the manual CSV imports. Simply open MySQL Workbench (or any MySQL-compatible RDBMS), create a new blank schema, and execute the provided .sql file. The script contains all 'CREATE TABLE' and 'INSERT' commands required to instantly reconstruct the interconnected Country, Economic, Investment, and Demographic tables, automatically enforcing all foreign key constraints. Tools Used: World Bank DataBank, UN Population Division Portal, Python (Pandas), Microsoft Excel, and MySQL Workbench (for database administration and SQL export).
Institutions
- Wentworth Institute of TechnologyMassachusetts, Boston