A corpus designed to study preprints produced during the Covid-19 crisis and to make comparative studies with the pre-pandemic period

Published: 11 March 2021| Version 1 | DOI: 10.17632/rn9b93x5d4.1


This dataset has been created to allow comparative studies of abstracts associated with preprints issued in response to the COVID-19 pandemic (from 01/01/2020 to 12/04/2020) relative to abstracts produced in 2019, the closest pre-pandemic period. The dataset has 2 files: - a txt file with the queries we ran in Dimensions and Lens to create the whole corpus and retrieve metadata - a csv file with the metadata for all preprints in the corpus and the positive, negative and hedge words we extracted with CorTexT Manager tool.



Ecole des Ponts ParisTech


Linguistics, Natural Language Processing, Bibliometrics