Global_Innovation_Index_Clustered_Panel_Data__With_PCA_2013_2022

Published: 31 December 2024| Version 1 | DOI: 10.17632/xrr862ssjd.1
Contributor:
Edilvando Pereira Eufrazio

Description

The Global Innovation Index (GII), published annually by the World Intellectual Property Organization (WIPO), ranks the innovation performance of approximately 130 economies. This dataset refines the GII by including 118 economies with complete data across all years (2013–2022) and seven core pillars: Institutions, Human Capital and Research, Infrastructure, Market Sophistication, Business Sophistication, Knowledge and Technology Outputs, and Creative Outputs. GII scores and Innovation Input and Output Sub-Indices are also included for reference but excluded from clustering and PCA to maintain focus on the granularity of the seven pillars. The K-means algorithm was used to cluster data, and five clusters were identified using the Elbow Method. This configuration was applied consistently across clustering approaches: input pillars, output pillars, all pillars combined, and PCA-enhanced clustering for each configuration. PCA was used to improve cluster separability and reduce dimensionality, resulting in additional cluster labels. The dataset enables detailed comparative analyses of innovation profiles, benchmarking, and temporal trend studies across economies. Data Structure: • Identifiers: Economy Name and ISO Code. • Seven Core Pillars: Institutions, Human Capital and Research, Infrastructure, Market Sophistication, Business Sophistication, Knowledge and Technology Outputs, and Creative Outputs (2013–2022). • Indices: GII Scores and Innovation Input/Output Sub-Indices for all years. • Cluster Labels: Direct clusters for input, output, and all pillars; PCA-enhanced clusters for input, output, and all pillars. This dataset is valuable for benchmarking innovation performance, supporting evidence-based policy-making, and studying innovation trends globally.

Files

Steps to reproduce

1. Data Source: The dataset was sourced from the Global Innovation Index (GII) portal at WIPO. 2. Data Preprocessing: • Filter economies with complete data across all years (2013–2022) for the seven core pillars: • Institutions index • Human capital and research index • Infrastructure index • Market sophistication index • Business sophistication index • Knowledge and technology outputs index • Creative outputs index • Include the Global Innovation Index scores and Innovation Input and Output Sub-Indices for all years as reference data, but exclude them from clustering and PCA. 3. Data Normalization: • Normalize the data using StandardScaler, which standardizes features by removing the mean and scaling to unit variance (mean=0, std=1). 4. Clustering: • Apply K-means clustering to three configurations: • Input pillars only. • Output pillars only. • All pillars combined. • Use the Elbow Method to determine the optimal number of clusters (set to five). This choice of five clusters was consistently applied across all clustering configurations. 5. PCA-Enhanced Clustering: • Apply PCA to reduce dimensionality and improve cluster separability for the same three configurations: • Input pillars. • Output pillars. • All pillars combined. • Perform clustering on the PCA-reduced data. 6. Output: Include columns for: • Clusters based on input pillars, output pillars, and all pillars. • PCA-enhanced clusters for input pillars, output pillars, and all pillars. • Retain the GII scores and sub-indices for all years as additional data for reference or supplementary analyses. By following these steps, the dataset can be reproduced and extended for specific applications.

Institutions

Universidade Federal Fluminense

Categories

Innovation, Investment

Licence