Normalized Multi-Table Socioeconomic, Population, Education, and Investment Dataset for 50 Countries (2018–2020)
Description
This dataset presents a normalized multi-table relational database containing socioeconomic, demographic, educational, and investment indicators for 50 selected countries for 2018, 2019, and 2020. The dataset is organized into five interconnected tables: Country, Economic_Indicators, Population_Statistics, Education_Statistics, and Investment_Indicators. Each table is linked through a common CountryID key, ensuring referential integrity and consistent relationships across the database. The Economic_Indicators table includes key macroeconomic variables such as Gross Domestic Product (GDP), GDP per capita, inflation rate, and unemployment rate. The Population_Statistics table provides demographic data, including the total population and the percentage of the population living in urban areas. The Education_Statistics table contains government expenditure on education as a percentage of GDP. The Investment_Indicators table includes foreign direct investment (FDI) inflows as a percentage of GDP. Together, these tables provide a comprehensive view of economic performance, demographic trends, education investment, and international capital flows. The data used in this dataset were collected from publicly available international sources, primarily the World Bank’s World Development Indicators. The datasets were carefully cleaned, filtered, and normalized to support efficient querying, maintain data integrity, and enable meaningful analysis. This dataset is suitable for comparative analysis of economic development, demographic changes, and policy-related indicators across multiple countries and time periods.
Files
Steps to reproduce
The dataset was created through a structured process involving data collection, cleaning, transformation, normalization, and database implementation. Relevant indicators were identified from the World Bank data portal, including GDP, GDP per capita, inflation rate, unemployment rate, total population, percentage of urban population, government education expenditure, and foreign direct investment. The raw datasets were downloaded in Excel format and initially contained multiple years, countries, and aggregated regions. The data was cleaned by selecting only the required years (2018, 2019, and 2020) and filtering the dataset to include only the 50 predefined countries. Inconsistencies in country names were resolved to ensure alignment with the database's Country table. The cleaned data were transformed from wide format, where each year is represented by a separate column, to long format, where each row represents a single country-year observation. This transformation ensured compatibility with the relational database design and supported Third Normal Form (3NF). The database schema was implemented in MySQL Workbench, with tables created that included appropriate primary keys, foreign keys, and unique constraints to maintain data integrity. Data was inserted using SQL INSERT statements, and additional attributes were populated using UPDATE statements. Finally, verification was performed using SQL COUNT and JOIN queries to confirm data completeness and relational consistency. The finalized dataset was exported as CSV files for publication. Tools used in this process include Microsoft Excel and MySQL Workbench for transformation, and MySQL Workbench for database implementation and querying.
Institutions
- Wentworth Institute of TechnologyMassachusetts, Boston