Global Economic and Population Indicators Dataset (Normalized Relational Structure, 1976–2025)
Description
This dataset presents a normalized relational database of global economic and population indicators collected from publicly available international data sources, including organizations such as the World Bank and United Nations. The dataset focuses on key metrics such as economic performance indicators and demographic population measurements across multiple countries and years. The original raw datasets contained redundant, inconsistent, and unstructured attributes, which were transformed into a structured format through normalization techniques. The database was designed using a relational model and normalized to reduce redundancy and improve data integrity. The final schema includes five primary tables: country, economic_indicator, economic_measurement, population_indicator, and population_measurement. Each table serves a specific purpose in organizing the data, with relationships defined using foreign keys to ensure referential integrity across entities. The economic_measurement table captures country-level economic metrics by linking country codes with specific indicators and years. Similarly, the population_measurement table captures demographic data segmented by attributes such as age group, sex, category, and time. Supporting tables, such as economic_indicator and population_indicator, define the metadata and meaning behind each measurement. This dataset is intended for academic and analytical purposes, enabling users to explore relationships between economic performance and demographic trends across countries. The normalized structure allows for efficient querying, scalability, and integration into analytical workflows such as machine learning, statistical analysis, and data visualization.
Files
Steps to reproduce
To reproduce this dataset, begin by collecting raw economic and population data from publicly available sources such as the World Bank and United Nations databases. These sources provide large-scale datasets containing country-level indicators over multiple years. The raw data typically includes redundant attributes, inconsistent naming conventions, and unstructured formats that must be processed before analysis. The first step in the workflow is importing the raw datasets into a relational database system such as MySQL using MySQL Workbench. During this stage, data types are assigned appropriately, and initial inspection is performed to understand the structure and identify inconsistencies. Data cleaning is then performed by standardizing column names, ensuring consistent formats across datasets, removing null or irrelevant records, and aligning attributes such as country identifiers and time fields. Next, the dataset is organized through a normalization process to achieve third normal form (3NF). This involves decomposing the raw tables into smaller, well-structured tables to eliminate redundancy and improve data integrity. The economic dataset is divided into three tables: country, economic_indicator, and economic_measurement. The population dataset is separated into population_indicator and population_measurement tables. Each table is designed so that all non-key attributes depend only on the primary key, removing partial and transitive dependencies. After normalization, relationships between tables are established using foreign key constraints. For example, the economic_measurement table references both country and economic_indicator, while the population_measurement table references both population_indicator and country. The country_code field is used as a shared identifier to link demographic and economic data across tables. SQL operations such as ALTER TABLE, UPDATE with JOINs, and constraint definitions are used to populate and enforce these relationships. The final step involves validating the dataset by checking referential integrity, ensuring all foreign key values match existing records, and confirming that the schema accurately represents the data. The normalized tables are then exported for sharing and documentation. The tools used in this process include MySQL Workbench for database design and execution, SQL for data manipulation and transformation, and spreadsheet software such as Microsoft Excel for initial data inspection and validation. By following this workflow, another researcher can reproduce the dataset or rebuild the database from the original sources.
Institutions
- Wentworth Institute of TechnologyMassachusetts, Boston