Datasets for analysis of co-occurrence of cell lines, basal media and supplementation in Open Access biomedical literature

Published: 08-02-2019| Version 1 | DOI: 10.17632/nvgy8pmkhk.1
Jessica Cox,
Ronald Daniel,
Darin McBeath


This dataset contains three key pieces of data: 1. journal-list-issn.csv : contains a list of journal names and ISSNs that our corpus was limited to. 2. mediaQueriesMendeley.csv: contains a list of 39 distinct queries we used to search the corpus, all referencing one of 27 unique basal medias. 3. The folder 'Open Access sentences' includes 4 partitioned parquet files that together comprise a dataframe of 15,424 sentences that appeared in one of the journals and had a hit for one of the basal medias. The dataframe is structured as 'sentence', 'pii', 'year'.