Trends in Economic Growth and Income Inequality: Data for 50 Countries (1975–2024)
Description
This dataset provides a longitudinal, multi-dimensional perspective on the global socioeconomic landscape, spanning a 50-year period from 1975 to 2024. By integrating 50 diverse nations across five distinct geographic regions and various income brackets, the data offers a robust framework for analyzing the complex interplay between macroeconomic expansion, demographic shifts, and the evolving nature of wealth distribution. The primary objective of this compilation is to facilitate research into how industrial transitions and employment structures influence national prosperity and social equity over a half-century of globalization. Data Composition and Indicators The dataset is meticulously structured into five core thematic domains, derived from the World Bank Open Data API. This ensures a high degree of reliability and standardization for cross-country comparisons. 1. Demographic and Spatial Dynamics To capture the changing human footprint, the dataset tracks Total Population alongside the spatial distribution of residents via Urban and Rural Population Percentages. Furthermore, Population Density provides insights into the intensity of land use and urbanization trends. 2. Macroeconomic Performance Economic health is monitored through Gross Domestic Product (GDP) and GDP per capita, providing both absolute and relative measures of national wealth. To account for economic stability and labor market efficiency, the data includes annual Inflation Rates and Unemployment Rates. 3. Employment Structure and Gender Stratification A distinctive feature of this dataset is the granular breakdown of labor markets. It tracks the percentage of the workforce in Agriculture, Industry, and Services. Crucially, each sector is further disaggregated by gender (e.g., Emp_Agri_F_Percent vs. Emp_Agri_M_Percent), enabling researchers to explore gender-specific shifts in labor as economies modernize from agrarian to service-oriented models. 4. Income Distribution and Poverty Metrics To address the "Inequality" aspect of the title, the dataset includes the Poverty Headcount Ratio and Mean Income (GNI per capita). It also provides a detailed look at wealth concentration by tracking the income shares held by the Top 10% and 20%, contrasted against the shares of the Bottom 10% and 20%. This allows for the calculation of inequality gaps and the study of middle-class erosion or expansion. This dataset is ideally suited for researchers investigating: - The correlation between industrialization (shift from Agriculture to Industry) and the Gini coefficient. - The impact of gender-based employment shifts on national GDP growth. - Long-term poverty reduction trends in relation to urban-rural migration. - The resilience of various income levels against global inflationary periods.
Files
Steps to reproduce
1. Technical Infrastructure and Toolset The reproduction of this dataset requires a specific software stack to handle data acquisition, transformation, and storage: - Data Source: World Bank Open Data (accessed via the api.worldbank.org). - Programming Language: Python 3.x. - Core Libraries: wbgapi (official World Bank API wrapper), pandas (data manipulation), and numpy (numerical processing). - Database Management System (DBMS): MySQL. - Modeling Tools: ER diagramming software (MySQL Workbench) for schema visualization. 2. Database Schema Design (DDL Phase) The first step in reconstruction is the initialization of the relational environment. The database is organized using a Star-like Schema where a central "Country" dimension table is linked to four thematic "Fact" tables. Normalization Strategy: The design utilizes Third Normal Form (3NF). Every indicator table is linked via a Foreign Key (countryID) to the master Country table. To maintain logical integrity, a Composite Unique Constraint is applied to the (countryID, year) pair across all indicator tables. This ensures that no country can have overlapping or duplicate temporal records. 3. Data Acquisition Workflow (Python Integration) Data is harvested programmatically to eliminate manual entry errors. The reproduction script follows this logic: - Indicator Selection: Map the specific World Bank Series IDs (e.g., NY.GDP.MKTP.CD for GDP, SI.POV.DDAY for Poverty) to the corresponding database columns. - Temporal & Geographic Filtering: Define the scope—50 ISO-3 country codes and a time range of 1975 to 2024. - API Call Implementation: Use wb.data.DataFrame() to fetch the data. The script must include numericTimeKeys=True to ensure years are treated as integers for SQL compatibility. 4. Data Processing and Transformation Raw data from the API is delivered in a "Wide" format (years as columns). A researcher must transform this into a "Long" format (rows for each year) to fit a relational structure. Pivoting: Use the .stack() and .unstack() methods in Pandas to align indicators into columns while maintaining (Country, Year) as the index. Data Cleaning: * Type Casting: Convert SP.POP.TOTL (Total Population) to BIGINT to accommodate values exceeding 2.1 billion. - Handling Nulls: Implement logic to replace NaN values with SQL-friendly NULL strings to prevent script crashes during the INSERT phase. - Rounding: Percentage-based indicators (e.g., unemployment_Rate) are rounded to two decimal places for consistency. 5. Database Population and Verification The final stage involves generating and executing the SQL INSERT statements. - Master Data: Populate the Country table first to satisfy Foreign Key constraints. - Sequential Loading: Execute the generated .sql files for Population, Economics, Employment, and Income in sequence.
Institutions
- Wentworth Institute of TechnologyMassachusetts, Boston