8-K reports database 2015 - 2019

Published: 1 June 2022| Version 1 | DOI: 10.17632/8dj8pdzjt3.1
Florian Barbaro


We create a dataset which focuses on 8-K reports for the years 2015 - 2019. We restrict ourselves to Standard & Poor's 500 companies. An 8-K is a report of unscheduled material events or corporate changes at a company that could be of importance to the shareholders or the Securities and Exchange Commission (SEC). Also known as a Form 8K, the report notifies the public of events, including acquisitions, bankruptcy, the resignation of directors, or changes in the fiscal year. We have compiled this dataset, thanks to SEC's EDGAR tool. The texts were pre-processed by applying a classical pipeline : - removal of non-alphanumeric characters; - lemmatisation; - removal of rare words and stopwords. The file (K8_data_2015_2019.rds) is a list of two items. The first item is composed of all information about the 8K and extracted texts. The second item is the document-term matrix with the pre-processed texts with 37238 texts and 70223 words. An example of 8-K can be found here https://www.sec.gov/files/form8-k.pdf.


Steps to reproduce

To reproduce the dataset, you must go on the EDGAR tools and search all reports from Standard & Poor's 500 companies using the files SNP500WMVT.rds. This file list all the company which were or are part of this index thanks to Wikipedia (https://en.wikipedia.org/wiki/List_of_S%26P_500_companies). Then you need to download all 8-K reports from 2015 to 2019 and preprocess the text.


Universite Paris 1 Pantheon-Sorbonne