Generative AI rewritten SEC filings
Description
The rewritten filings in this dataset were generated using large language models from OpenAI, specifically GPT-4o and GPT-4o-mini. The rewriting process focused on the Management Discussion and Analysis (MD&A) part (section 7 and section 2, respectively) of 10-K and 10-Q filings, with the goal of maintaining the original content while improving the sentiment. There are 11,266 10-K and 32,620 10-Q filings. The sample was selected ensuring neutrality by considering both the year and sector. This method ensures a balanced representation across different time periods and industries, avoiding biases related to specific sectors or years in the rewritten filings. The rewritten filings in this dataset are saved in text files named according to the format: cik + '_' + accession number of the filing + '_section' + section number + '_' + model + '.txt' The following query was used: Please rewrite the provided MD&A section of a 10-K filing. Your goal is to create a new version that maintains the original meaning, key details, and financial information, but with more positive wording and phrasing. Ensure the rewritten text is coherent, professionally written, and retains the appropriate tone for a financial report. Also, enhance the positive sentiment by highlighting achievements, growth, and opportunities, while preserving all factual content. \n\nOriginal Text:
Files
Steps to reproduce
Download MD&A sections of 10-K and 10-Q filings, select sample (time and sector neutral), use OpenAI's API to rewrite filings
Institutions
Categories
Funding
OpenAI
6520