SDG mentioning in corporate sustainability reports 2016/2020

Published: 10 November 2023| Version 3 | DOI: 10.17632/nwckk5ds9k.3
Contributor:
Johannes van der Waal

Description

This dataset contains the SDG mentioning frequencies in corporate sustainability reports of a two-year set of 300 large enterprises taken from the Stoxx Global 3000. It has three equal groups of USA, European and East-Asian (Japan, Korea, Taiwan or "JKT") companies. The sustainability reports of these 300 companies were collected from a database (corporateregister.com). All texts were analysed for the presence using a dictionary created by the author of characteristic SDG words taken from the SDG foundational documents (the text of the UN resolution) (SDG-dictionary.txt). The data set can be used to explore the sustainability reporting practices of large stocklisted companies in connection with financial and organizational variables. Additionally, the data can be used to explore other features of sustainability reporting, as the original document-feature matrix (dfm) has also been included. The second version of this data set also contains text fragments of the reports that contain references to the SDGs. They come in two forms: text fragments and sentences that both contain any of the words "sustainable development goals", "sdgs", "united nations", "2030 agenda", and "global compact". These are zipped text files that can be imported into a CAQDAS programme for manual text analysis (coding). The file names indicate the company's ISIN and the reporting year.

Files

Steps to reproduce

The dataset contains the ISIN code, the year and the SDG word frequencies. The document_ID contains a code for the type of report: SR = stand-alone sustainability report IR = integrated report GC = Global Compact Communication of Progress Report ER = environmental report HR = human resources report CR = Climate related financial report GR = GRI content index (separately published) For each company, all reports were retrieved as they appeared in the corporateregister.com database. Some companies have more than one report. If desired, the scores per SDG can be merged to have one score per company. The data was used to estimate the weight of the different SDG-topics in the reports. The frequencies are available as absolute and relative counts (weighted on the number of words in the document). 1. Get Stoxx Global 3000 list. 2. Select 100 large companies from each country group using propensity matching on company size (log assets). These are in the file "company_list.csv". 3. Collect sustainability reports in PDF form 4. Convert PDF to text 5. Make a corpus and tokenize, removing stopwords and company names from text 6. Convert tokenized text to a document-feature matrix (dfm) 7. Create SDG dictionary. This is the file "sdg-dictionary.txt", included here just for reference. 8. Map SDG dictionary on dfm, absolute or weighted 9. Export output to data file. These are the files "SDG_frequencies_absolute.txt" and "SDG_frequencies-weighted.txt". The files have 545 documents, from 250 unique companies. Some companies have more than one report per year. You can merge the scores if you want or only select the document type that is of interest to you. The missing 50 companies did not publish a sustainability report in the years 2016 or 2020. Comparing the ISINs from the company list with the SDG_frequencies files will show which companies did not not publish a report. 10. Merge the company list with the SDG frequencies files. Data structure: doc_id: file name of corporate report containing ISIN, year, type of report and serial number if more than one report (e.g. for report plus separate attachment, like data report). sdg01-17 and sdg: SDG word counts, absolute or relative (relative is count divided by total word count of report (dfm)) gc/gri/int: word counts related to Global Compact/Global Reporting Initiative and IIRC/<IR> integrated reporting standard. A score higher than 0 is indicative of the company being GC member or following the GRI or IIRC reporting standard . country, year and company ISIN are extracted from the doc_id. All the data processing was performed with the R package "quanteda" by Benoit, K., Watanabe, K., Wang, H., Nulty, P., Obeng, A., Müller, S., & Matsuo, A. (2018). quanteda: An R package for the quantitative analysis of textual data. Journal of Open Source Software, 3(30), 774. https://doi.org/10.21105/joss.00774 For the quanteda tutorial, see: https://tutorials.quanteda.io/

Institutions

Open Universiteit Faculteit Management science en technologie

Categories

Corporate Social Responsibility Reporting, Corporate Sustainability Reporting, Sustainable Development Goals

Licence