Generative AI rewritten SEC filings

Published: 22 October 2024| Version 1 | DOI: 10.17632/xxpjm7pxbg.1
Contributor:
Sebastian Lehner

Description

The rewritten filings in this dataset were generated using large language models from OpenAI, specifically GPT-4o and GPT-4o-mini. The rewriting process focused on the Management Discussion and Analysis (MD&A) part (section 7 and section 2, respectively) of 10-K and 10-Q filings, with the goal of maintaining the original content while improving the sentiment. There are 11,266 10-K and 32,620 10-Q filings. The sample was selected ensuring neutrality by considering both the year and sector. This method ensures a balanced representation across different time periods and industries, avoiding biases related to specific sectors or years in the rewritten filings. The rewritten filings in this dataset are saved in text files named according to the format: cik + '_' + accession number of the filing + '_section' + section number + '_' + model + '.txt' The following query was used: Please rewrite the provided MD&A section of a 10-K filing. Your goal is to create a new version that maintains the original meaning, key details, and financial information, but with more positive wording and phrasing. Ensure the rewritten text is coherent, professionally written, and retains the appropriate tone for a financial report. Also, enhance the positive sentiment by highlighting achievements, growth, and opportunities, while preserving all factual content. \n\nOriginal Text:

Files

Steps to reproduce

Download MD&A sections of 10-K and 10-Q filings, select sample (time and sector neutral), use OpenAI's API to rewrite filings

Institutions

Bergische Universitat Wuppertal

Categories

Accounting, Economics, Finance, Generative Artificial Intelligence, Generative Pre-Trained Transformer 4

Funding

OpenAI

6520

Licence