CSI Firms' Survival Analysis
Description
1. Define Study Scope Industry scope: Use NACE codes 2611, 2612, and 2620 (manufacture of electronic components, loaded electronic boards, and computer equipment). Population: All Chinese semiconductor SMEs with ≥2 employees. Observation window: 1980–2019. Unit of analysis: Each firm. Outcome of interest: Firm survival duration (registration → deregistration). 2. Retrieve Firm-Level Data Primary Source: National Enterprise Credit Information Publicity System of China (企业信用信息公示系统). Fields: Firm name, registration date, deregistration date, employee size, branch status, accounting type, WOCO membership, and location (city-level geocode). Filter: Keep firms classified under NACE 2611–2620 with “inactive” status by 2019. Derived variable: survival = deregistration_year – registration_year 3. Construct City-Level Variables Merge firm records with city-level institutional and infrastructural data from official statistical sources: Variable Source Computation City Clusters China Statistical Yearbooks (firm counts per city) SMEs per 100,000 population (log-transformed) Science Park Ministry of Science and Technology (MOST) list of National Hi-Tech Parks 1 = presence in city; 0 = none Lead University (985) Ministry of Education of China 1 = city has 985 S&T university; 0 = no FDI Inflow National Bureau of Statistics (NBS) 20-year mean FDI per capita (USD, logged) Law Firms’ Density National Judicial Administration database Law firms per 100,000 population (logged) 4. Create Control Variables Municipality: 1 = Beijing, Shanghai, Tianjin, Chongqing. Industrial Park: 1 = city has national-level industrial park. Branch vs. Independent: 1 = subsidiary or spin-off. WOCO Compliance: 1 = WOCO member firm. LF Consolidation: 1 = local-financial reporting level. Unconsolidated Accounting: 1 = unconsolidated statements. SIC 2611: 1 = core semiconductor manufacturing. Coastal City: 1 = coastal province capital. High-Speed Rail: 1 = connected city (based on China Railway Corp data). Post-BRI Exit: 1 = exit after 2013. Post-Tech-War Exit: 1 = exit after 2016. 5. Merge and Clean the Dataset Merge firm-level and city-level data by city name or administrative code. Drop duplicate firms and outliers with survival > 60 years. Apply log transformation to continuous predictors. Validate variable distributions and correlations (as in Table 2). 6. Verify Model Readiness Conduct VIF diagnostic to ensure no multicollinearity (mean ≈ 1.9 acceptable). Check for proportional hazards violation using Cox PH model. Switch to AFT (Weibull) when proportionality fails. 7. Replicate Statistical Analysis
Files
Steps to reproduce
1. Define Study Scope Industry scope: Use NACE codes 2611, 2612, and 2620 (manufacture of electronic components, loaded electronic boards, and computer equipment). Population: All Chinese semiconductor SMEs with ≥2 employees. Observation window: 1980–2019. Unit of analysis: Each firm. Outcome of interest: Firm survival duration (registration → deregistration). 2. Retrieve Firm-Level Data Primary Source: National Enterprise Credit Information Publicity System of China (企业信用信息公示系统). Fields: Firm name, registration date, deregistration date, employee size, branch status, accounting type, WOCO membership, and location (city-level geocode). Filter: Keep firms classified under NACE 2611–2620 with “inactive” status by 2019. Derived variable: survival = deregistration_year – registration_year 3. Construct City-Level Variables Merge firm records with city-level institutional and infrastructural data from official statistical sources: Variable Source Computation City Clusters China Statistical Yearbooks (firm counts per city) SMEs per 100,000 population (log-transformed) Science Park Ministry of Science and Technology (MOST) list of National Hi-Tech Parks 1 = presence in city; 0 = none Lead University (985) Ministry of Education of China 1 = city has 985 S&T university; 0 = no FDI Inflow National Bureau of Statistics (NBS) 20-year mean FDI per capita (USD, logged) Law Firms’ Density National Judicial Administration database Law firms per 100,000 population (logged) 4. Create Control Variables Municipality: 1 = Beijing, Shanghai, Tianjin, Chongqing. Industrial Park: 1 = city has national-level industrial park. Branch vs. Independent: 1 = subsidiary or spin-off. WOCO Compliance: 1 = WOCO member firm. LF Consolidation: 1 = local-financial reporting level. Unconsolidated Accounting: 1 = unconsolidated statements. SIC 2611: 1 = core semiconductor manufacturing. Coastal City: 1 = coastal province capital. High-Speed Rail: 1 = connected city (based on China Railway Corp data). Post-BRI Exit: 1 = exit after 2013. Post-Tech-War Exit: 1 = exit after 2016. 5. Merge and Clean the Dataset Merge firm-level and city-level data by city name or administrative code. Drop duplicate firms and outliers with survival > 60 years. Apply log transformation to continuous predictors. Validate variable distributions and correlations (as in Table 2). 6. Verify Model Readiness Conduct VIF diagnostic to ensure no multicollinearity (mean ≈ 1.9 acceptable). Check for proportional hazards violation using Cox PH model. Switch to AFT (Weibull) when proportionality fails. 7. Replicate Statistical Analysis