How can high-tech manufacturing achieve high innovation productivity? A configurational path analysis under the TOE framework.

Name: How can high-tech manufacturing achieve high innovation productivity? A configurational path analysis under the TOE framework.
Creator: Juan Lin
Published: 2026-03-20T05:36:22.685Z
Keywords: Innovation Management, Innovation Strategy, Technology

Lin, Juan; 孙, 孟超

doi:10.17632/w9r79y5pzn.1

How can high-tech manufacturing achieve high innovation productivity? A configurational path analysis under the TOE framework.

Published: 20 March 2026| Version 1 | DOI: 10.17632/w9r79y5pzn.1

Contributors:

Juan Lin, 孟超孙

Description

1. What is this dataset? This repository contains the comprehensive dataset and original execution scripts (in R and Python) supporting the dynamic Qualitative Comparative Analysis (QCA) of high-tech manufacturing innovation productivity in China. It provides all necessary materials to fully reproduce the configurational path analysis, temporal trend visualizations, industry heterogeneity evaluations, and out-of-sample predictive validity tests presented in the manuscript based on the Technology-Organization-Environment (TOE) framework. 2. How was this dataset collected? The raw panel data were collected from Chinese A-share listed high-tech manufacturing firms covering the period from 2015 to 2024. Financial and patent data were sourced from authoritative databases including CSMAR and WIND. 3. What files are included? The repository is structured into 6 core files to ensure complete transparency and reproducibility: PANELDATA.csv: The primary panel dataset containing the foundational data for the analytical sample, used as the main input for the dynamic QCA process. DYNAMIC.R: The core R script utilizing the QCA and admisc packages. It executes the fuzzy-set calibration, necessity and sufficiency analyses (truth table minimization), and computes both between-group and within-group consistencies across different industry configurations. Calibrated_Data.csv: The fully calibrated fuzzy-set dataset exported from the main QCA procedure, serving as the direct input for the out-of-sample testing. Out-of-Sample Predictive Validity Test.py: A Python script utilizing pandas and seaborn to perform predictive validity testing on a holdout sample (2020-2024). It calculates the consistency and coverage of the specific configurations and automatically generates scatter plots for validation. plot_data.csv: A highly structured dataset specifically extracted and formatted from the QCA clustering results, dedicated to generating temporal trend lines. photo.R: An R script utilizing the ggplot2 package to read plot_data.csv and visualize the intertemporal evolutionary trends of configurational consistency over the decade. 4. How can this dataset be used? Researchers and reviewers can download this complete package into a single local directory to achieve "plug-and-play" reproducibility. By running the R and Python scripts sequentially, users can replicate the exact configurational pathways, robustness checks, and high-quality figures discussed in the study. Furthermore, it serves as a methodological template for scholars intending to integrate dynamic QCA with machine-learning-inspired out-of-sample prediction in management research.

Files

Steps to reproduce

Step 1: Environment Setup Download all the files (.csv, .R, and .py) from this repository and place them into a single local folder. Set this folder as your working directory in both your R and Python environments. Step 2: Dynamic QCA & Calibration (R Environment) Open and execute DYNAMIC.R using R or RStudio (requires packages: QCA, admisc, SetMethods, etc.). This script reads PANELDATA.csv to perform fuzzy-set calibration, necessity analysis, and truth table minimization. It also calculates the between-group and within-group consistencies for high-tech manufacturing industries. Step 3: Out-of-Sample Predictive Validity Testing (Python Environment) Open and execute Out-of-Sample Predictive Validity Test.py in a Python environment (requires: pandas, numpy, matplotlib, seaborn). This script automatically reads the calibrated data (Calibrated_Data.csv), extracts the 2020-2024 holdout sample, calculates the consistency/coverage for the specific pathways, and generates predictive validity scatter plots. Step 4: Trend Visualization (R Environment) Open and execute photo.R in RStudio (requires: ggplot2, reshape2). This script reads the structural dataset plot_data.csv to generate the intertemporal evolutionary trend lines of configurational consistencies across different years.

How can high-tech manufacturing achieve high innovation productivity? A configurational path analysis under the TOE framework.

Description

Files

Steps to reproduce

Institutions

Categories

Funders

Licence