Data for "Updated science-wide author databases of standardized citation indicators"

Published: 08-10-2020| Version 2 | DOI: 10.17632/btchxktzyw.2
Jeroen Baas,
Kevin Boyack,
John Ioannidis


Citation metrics are widely used and misused. We have created a publicly available database of 100,000 top-scientists that provides standardized information on citations, h-index, co-authorship adjusted hm-index, citations to papers in different authorship positions and a composite indicator. Separate data are shown for career-long and single year impact. Metrics with and without self-citations and ratio of citations to citing papers are given. Scientists are classified into 22 scientific fields and 176 sub-fields. Field- and subfield-specific percentiles are also provided for all scientists who have published at least 5 papers. Career-long data are updated to end-of-2019. The dataset and code provides an update to previously released (version 1) data under; The version 2 dataset is based on the May 06, 2020 snapshot from Scopus and is updated to citation year 2019. In addition to the time period and datacut update, it provides a longer list of authors: it also includes the top 2% for every subfield.

Steps to reproduce

Code is provided with the dataset and runs on the ICSR Lab data sharing platform ( using Scopus data. It is written in python (pyspark) and can be used with other datasets on any pyspark platform.

Related Links