S&P 500 AI and Governance-Risk Disclosure Panel, 2020–2024
Description
This dataset provides a firm-year panel of non-financial S&P 500 companies for the period 2020–2024. It was prepared to support empirical analysis of the relationship between artificial intelligence disclosure salience and governance-risk disclosure in large U.S. public firms. The final panel contains 2,106 firm-year observations and 424 unique firms, after excluding financial-sector firms, observations without usable disclosure information, observations with missing core controls, and duplicate CIK-year records. The main panel includes firm identifiers, including zero-padded CIK codes, company names, ticker symbols, year, and sector classification, together with disclosure-based variables and financial controls. The key text-based variables are AI disclosure salience, governance-risk disclosure counts, and the log-transformed governance-risk measure. Additional variables include profitability, leverage, sales growth, firm size, Big 4 auditor status, CEO duality, and related firm-year controls. The workbook also includes supporting documentation sheets: a README, a variable guide, sample checks, and dictionary terms used to identify AI-related and governance-risk language. The dataset is intended for academic research, methodological illustration, and replication of the empirical workflow described in the related manuscript. Users should consult the accompanying variable guide and dictionary sheets before reusing or extending the data.
Files
Steps to reproduce
1. Start from the S&P 500 firm-year universe for 2020–2024. 2. Exclude financial-sector firms because their governance-risk and control disclosures are shaped by sector-specific regulation. 3. Retain only firm-year observations with usable annual disclosure text, valid firm identifiers, sector information, and complete core financial controls. 4. Standardize firm identifiers by converting CIK codes into 10-character zero-padded text strings. 5. Remove duplicate CIK-year observations so that each row represents one firm in one fiscal year. 6. Construct the AI disclosure salience variable by counting AI-related terms in the disclosure text using the dictionary provided in the Dictionaries sheet. 7. Construct the governance-risk disclosure variable by counting governance-risk and control-related phrases using the governance-risk dictionary provided in the workbook. 8. Compute the log-transformed governance-risk variable as: governance_risk = ln(1 + gov_risk) 9. Add firm-year financial and governance controls, including profitability, leverage, sales growth, firm size, Big 4 auditor status, and CEO duality. 10. Run the sample checks reported in the Sample_Checks sheet to confirm the final structure: 2,106 firm-year observations, 424 unique firms, no financial-sector observations, no duplicate CIK-year rows, and readable zero-padded CIK identifiers. 11. Use the Panel sheet as the main analysis file. The README, Variable_Guide, Sample_Checks, and Dictionaries sheets document the structure, variable definitions, and text-counting rules.
Institutions
- Ca' Foscari University of VeniceVeneto, Venice