Open Data Created by Elsevier Research and Development teams

Contributor(s)Elsevier Team

Description of this collection

A collection of open datasets created by different groups across Elsevier in collaboration with our research collaboration partners. In line with FAIR data practices, these data are openly shared to foster research and promote reproducibility. Our current research in data science spans natural language processing, fact extraction and entity identification. We also have projects studying research itself through the lenses of gender, researcher mobility, FAIR data use, peer review and the impact of sustainable development goals.

Information

Published: 23 Aug 2021

Institutions

Elsevier BV, Elsevier Ltd, Elsevier Labs, Elsevier Inc Rockville Office, Elsevier Ltd, Elsevier GmbH, Elsevier Inc San Diego, Elsevier Beijing Ltd, Elsevier Inc Cambridge, Elsevier Ltd Exeter

Datasets within this collection

31 results

Data for report "Artificial Intelligence: How knowledge is created, transferred, and used"
Hellwig, Joerg, Huggett, Sarah, Siebert, Mark
Published 24 January 2024 | Elsevier BV
There are the underlying data for our report "Artificial Intelligence: How knowledge is created, transferred, and used", published 2018. Data can be used to construct the graphs used in the report.
- Dataset
Export:APA BibTeX DataCite RIS
October 2023 data-update for "Updated science-wide author databases of standardized citation indicators"
Ioannidis, John P.A.
Published 4 October 2023 | Elsevier BV
Citation metrics are widely used and misused. We have created a publicly available database of top-cited scientists that provides standardized information on citations, h-index, co-authorship adjusted hm-index, citations to papers in different authorship positions and a composite indicator (c-score). Separate data are shown for career-long and, separately, for single recent year impact. Metrics with and without self-citations and ratio of citations to citing papers are given. Scientists are classified into 22 scientific fields and 174 sub-fields according to the standard Science-Metrix classification. Field- and subfield-specific percentiles are also provided for all scientists with at least 5 papers. Career-long data are updated to end-of-2022 and single recent year data pertain to citations received during calendar year 2022. The selection is based on the top 100,000 scientists by c-score (with and without self-citations) or a percentile rank of 2% or above in the sub-field. This version (6) is based on the October 1, 2023 snapshot from Scopus, updated to end of citation year 2022. This work uses Scopus data provided by Elsevier through ICSR Lab (https://www.elsevier.com/icsr/icsrlab). Calculations were performed using all Scopus author profiles as of October 1, 2023. If an author is not on the list it is simply because the composite indicator value was not high enough to appear on the list. It does not mean that the author does not do good work. PLEASE ALSO NOTE THAT THE DATABASE HAS BEEN PUBLISHED IN AN ARCHIVAL FORM AND WILL NOT BE CHANGED. The published version reflects Scopus author profiles at the time of calculation. We thus advise authors to ensure that their Scopus profiles are accurate. REQUESTS FOR CORRECIONS OF THE SCOPUS DATA (INCLUDING CORRECTIONS IN AFFILIATIONS) SHOULD NOT BE SENT TO US. They should be sent directly to Scopus, preferably by use of the Scopus to ORCID feedback wizard (https://orcid.scopusfeedback.com/) so that the correct data can be used in any future annual updates of the citation indicator databases. The c-score focuses on impact (citations) rather than productivity (number of publications) and it also incorporates information on co-authorship and author positions (single, first, last author). If you have additional questions, please read the 3 associated PLoS Biology papers that explain the development, validation and use of these metrics and databases. (https://doi.org/10.1371/journal.pbio.1002501, https://doi.org/10.1371/journal.pbio.3000384 and https://doi.org/10.1371/journal.pbio.3000918). Finally, we alert users that all citation metrics have limitations and their use should be tempered and judicious. For more reading, we refer to the Leiden manifesto: https://www.nature.com/articles/520429a
- Dataset
Export:APA BibTeX DataCite RIS
COVID-19: Public health, and societal and psychological impacts datasets
Elsevier Team
Published | Elsevier BV
- Collection
COVID-19: Epidemiology & infectious modelling datasets
Elsevier Team
Published | Elsevier BV
- Collection
COVID-19: Genetics, genomics & molecular structure datasets
Elsevier Team
Published | Elsevier BV
- Collection
COVID-19: Vaccine, prevention, diagnosis & treatment datasets
Elsevier Team
Published | Elsevier BV
- Collection
Mendeley Data FAIRest Datasets
Alberto Zigoni, Rachael Delevante, Wouter Haak
Published | Elsevier BV
- Collection
Elsevier OA CC-BY Corpus
Kershaw, Daniel, Koeling, Rob
Published 16 September 2020 | Elsevier BV
This is a corpus of 40k (40,001) open access (OA) CC-BY articles from across Elsevier’s journals represent the first cross-discipline research of data at this scale to support NLP and ML research. This dataset was released to support the development of ML and NLP models targeting science articles from across all research domains. While the release builds on other datasets designed for specific domains and tasks, it will allow for similar datasets to be derived or for the development of models which can be applied and tested across domains.
- Dataset
Export:APA BibTeX DataCite RIS
ChEMU dataset for information extraction from chemical patents
Verspoor, Karin, Nguyen, Dat Quoc, Akhondi, Saber A., Druckenbrodt, Christian, Thorne, Camilo et al
Published 12 September 2020 | Mendeley Data
The discovery of new chemical compounds and their synthesis process is of great importance to the chemical industry. Patent documents contain critical and timely information about newly discovered chemical compounds, providing a rich resource for chemical research in both academia and industry. Chemical patents are often the initial venues where a new chemical compound is disclosed. Only a small proportion of chemical compounds are ever published in journals and these publications can be delayed by up to 3 years after the patent disclosure. In addition, chemical patent documents usually contain unique information, such as reaction steps and experimental conditions for compound synthesis and mode of action. These details are crucial for the understanding of compound prior art, and provide a means for novelty checking and validation. Due to the high volume of chemical patents, approaches that enable automatic information extraction from these patents are in demand. To develop natural language processing methods for large-scale mining of chemical information from patent texts, a corpus is created providing chemical patent snippets and annotated entities and reaction steps.
- Dataset
Export:APA BibTeX DataCite RIS
The researcher journey through a gender lens
Jayabalasingham, Bamini, Collins, Tom, Kuiper-Hoyng, Liliane, Zhang, Jin, Roberge, Guillaume
Published 17 August 2020 | Elsevier BV
Data underlying the analyses in chapters 1, 2, 3, and 5 of the report "The researcher journey through a gender lens" (www.elsevier.com/connect/gender-report), which provides an analysis of the researcher journey, analysed using a gender lens. Data on authors, grantees and patent applicants pertain to researchers active during two periods, 16 geographies, and 26 subject areas and 11 sub-fields of medicine. Theses data are provided at the aggregated level.
- Dataset
Export:APA BibTeX DataCite RIS

Open Data Created by Elsevier Research and Development teams

Description of this collection

Information

Institutions

Categories

Datasets within this collection