Derived data for “From timber forests to carbon infrastructure”
Description
This dataset supports the manuscript “From timber forests to carbon infrastructure.” It contains redistributable derived data and supplementary tables for a 1908–2026 forest-management discourse analysis in New England. The deposit includes the corpus manifest, extraction log, keyword-domain matrix, document-level scores, year-level scores, period-level scores, and supplementary tables used to support the analysis. Source PDFs and raw extracted full texts are not redistributed because copyright, licence, and access conditions vary by source. Main manuscript figure files are not included in this deposit because they will be submitted directly with the journal manuscript. Analysis scripts are retained by the author and can be made available upon reasonable request after removal of local paths and source-access routines.
Files
Steps to reproduce
1. Download and unzip the dataset. 2. Open `README_MENDELEY_DATA.md` to review the folder structure, source boundary and locked analytical counts. 3. Open the `data/` folder. Use `CORPUS_MANIFEST.csv` to inspect the 215 identified forest-management documents and `EXTRACTION_LOG.csv` to verify the 201 successfully extracted/scored records and 14 OCR-needed or excluded records. 4. Open `KEYWORD_MATRIX.csv` to review the five deterministic discourse domains: production and silviculture, ecological structure, carbon and ecosystem function, disturbance and climate risk, and governance and implementation. 5. Open `DOCUMENT_DOMAIN_SCORES.csv` to inspect document-level normalized domain scores. These scores were calculated as keyword counts normalized per 1,000 words. 6. Open `DOMAIN_SCORES_BY_YEAR.csv` and `DOMAIN_SCORES_BY_PERIOD.csv` to reproduce the year-level and period-level trends reported in the manuscript. 7. Use the period-level scores to calculate the two transition indicators: * Carbon transition signal = (C + D + G) - P * Governance lag signal = (C + D) - G 8. Compare the calculated period-level indicators with `Supplementary_Table_5_Period_Domain_Scores.csv` in the `supplementary/` folder. 9. Open `Supplementary_Table_1_Corpus_Publications.csv` to review the full corpus bibliography and inclusion/exclusion status. Open `Supplementary_Table_2_Keyword_Matrix.csv`, `Supplementary_Table_3_AI_Methods_References.csv`, `Supplementary_Table_4_OCR_Excluded_Records.csv`, and `Supplementary_Table_5_Period_Domain_Scores.csv` to reproduce the supplementary evidence base. 10. Review `documentation/DATA_DICTIONARY.md` and `documentation/METHODOLOGY_SUMMARY.md` for variable definitions, scoring logic, limitations and interpretation guidance. 11. Note that source PDFs and raw extracted full texts are not redistributed because copyright, licence and access conditions vary by source. The dataset provides metadata, extraction status, derived scores and supplementary tables for audit and reproduction by users with lawful access to the original documents. 12. Analysis scripts are not included in this deposit. They are retained by the author and can be made available upon reasonable request after removal of local paths and source-access routines.
Institutions
- Chulalongkorn UniversityBangkok, Bangkok