Replication package for "Strategic Cost Efficiency in AI-Driven Financial Management: A Case Study of DeepSeek’s Digital Transformation" (CFRI submission, v1.0)
Description
Replication package for “Strategic Cost Efficiency in AI-Driven Financial Management: A Case Study of DeepSeek’s Digital Transformation” (CFRI submission, v1.0). This dataset contains the exact code, metadata, and analysis-ready files needed to reproduce all figures, tables, diagnostics, and robustness checks reported in Sections 4–6 of the manuscript. All inputs originate from public sources; no proprietary data were used. Contents /code/ — scripts organized as 0_fetch/ (download public inputs), 1_clean_transform/ (apply transformations & priors), 2_analysis/ (models, diagnostics, robustness), 3_figures_tables/ (renders all outputs), plus a single entry point run_all.{sh|bat|py|R} and a random-seeds.txt. /data/processed/ — analysis-ready CSVs used by the paper. (If redistribution of originals is restricted, raw files are not included; use /code/0_fetch/ to retrieve them.) /metadata/ — provenance_sources.csv mapping each input to Appendix C (C.1–C.2) with URL and access date; transforms_priors.xlsx mirroring Appendix D; variables_dictionary.xlsx (names, units, and construction). /env/ — environment files (requirements.txt or environment.yml) and optional session info to ensure reproducibility. /output/ — auto-generated figures, tables, and diagnostic artifacts that match the manuscript. Reproducibility Run the repository root script (run_all…) to: (1) fetch public inputs (when raw files are not redistributed), (2) rebuild processed datasets, and (3) regenerate every figure/table for §§4–6, including diagnostics and robustness panels. The package is platform-agnostic; instructions are provided for a standard Python/Conda setup (optional R/Stata scripts are included where relevant). Licensing & citation Data are released under CC BY 4.0; code under MIT. Please cite this dataset’s DOI and the associated article. A CITATION.cff file is provided for citation managers. Notes The CHANGELOG.md documents versioning (v1.0 = submission). Post-acceptance updates will add the article DOI and any minor alignment fixes. This record may be embargoed until journal acceptance; metadata remain public. Corresponding author: Marco I. Bonelli (ORCID 0000-0003-3463-6421).
Files
Steps to reproduce
Download & prepare folder Create a working folder (e.g., deepseek-cfri/). From this dataset’s Excel file, export the following sheets as files with the same names and place them in this structure: deepseek-cfri/ run_all.py seeds.txt env/requirements.txt env/environment.yml metadata/provenance_sources.csv metadata/transforms_priors.xlsx ← save sheets “tp__priors” and “tp__transforms” into one .xlsx metadata/variables_dictionary.xlsx code/0_fetch/fetch_sources.py ← from sheet “code_0_fetch.py” code/1_clean_transform/make_processed.py code/2_analysis/run_monte_carlo.py code/3_figures_tables/render_outputs.py Tip (Excel): Save “provenance_sources” as CSV; save “tp__priors” and “tp__transforms” together as transforms_priors.xlsx (two sheets). Save “variables_dictionary” as variables_dictionary.xlsx (one sheet). Create the software environment (Python ≥3.10) Conda (recommended): conda env create -f env/environment.yml conda activate deepseek-cfri Pip (alternative): python -m venv .venv # Windows: .venv\Scripts\activate # macOS/Linux: source .venv/bin/activate pip install -r env/requirements.txt Run the pipeline From the project root: python run_all.py This orchestrates: code/0_fetch/fetch_sources.py (lists public-source URLs from metadata/provenance_sources.csv; no downloads required for the demo run). code/1_clean_transform/make_processed.py (transform stubs aligned to metadata/transforms_priors.xlsx). code/2_analysis/run_monte_carlo.py (10,000-run simulation using priors; seed in seeds.txt = 42). code/3_figures_tables/render_outputs.py (placeholders to render figures/tables). Outputs (check these) After a successful run you should see: output/tables/net_margin_summary.csv output/diagnostics/subsidy_sensitivity_stats.csv Open net_margin_summary.csv and verify it contains percentile and mean summaries of simulated net margins, and subsidy_sensitivity_stats.csv shows descriptive stats for the sensitivity run. Reproducibility notes Randomness is fixed by seeds.txt (42). Change the seed to explore robustness. All inputs referenced are public; any fetching of raw files (if you choose to enable it) should follow the URLs and access dates listed in metadata/provenance_sources.csv. The package uses only NumPy/Pandas/Matplotlib (see env/requirements.txt or env/environment.yml). Cite When using these materials, cite the dataset DOI (this record) and the associated article.