World’s Top 2% of Scientists list by Stanford University: An Analysis of its Robustness
John Ioannidis and co-authors  created a publicly available database of top-cited scientists in the world. This database, intended to address the misuse of citation metrics, has generated a lot of interest among the scientific community, institutions, and media. Many institutions used this as a yardstick to assess the quality of researchers. At the same time, some people look at this list with skepticism citing problems with the methodology used. Two separate databases are created based on career-long and, single recent year impact. This database is created using Scopus data from Elsevier[1-3]. The Scientists included in this database are classified into 22 scientific fields and 174 sub-fields. The parameters considered for this analysis are total citations from 1996 to 2022 (nc9622), h index in 2022 (h22), c-score, and world rank based on c-score (Rank ns). Citations without self-cites are considered in all cases (indicated as ns). In the case of a single-year case, citations during 2022 (nc2222) instead of Nc9622 are considered. To evaluate the robustness of c-score-based ranking, I have done a detailed analysis of the matrix parameters of the last 25 years (1998-2022) of Nobel laureates of Physics, chemistry, and medicine, and compared them with the top 100 rank holders in the list. The latest career-long and single-year-based databases (2022) were used for this analysis. The details of the analysis are presented below: Though the article says the selection is based on the top 100,000 scientists by c-score (with and without self-citations) or a percentile rank of 2% or above in the sub-field, the actual career-based ranking list has 204644 names. The single-year database contains 210199 names. So, the list published contains ~ the top 4% of scientists. In the career-based rank list, for the person with the lowest rank of 4809825, the nc9622, h22, and c-score were 41, 3, and 1.3632, respectively. Whereas for the person with the No.1 rank in the list, the nc9622, h22, and c-score were 345061, 264, and 5.5927, respectively. Three people on the list had less than 100 citations during 96-2022, 1155 people had an h22 less than 10, and 6 people had a C-score less than 2. In the single year-based rank list, for the person with the lowest rank (6547764), the nc2222, h22, and c-score were 1, 1, and 0. 6, respectively. Whereas for the person with the No.1 rank, the nc9622, h22, and c-score were 34582, 68, and 5.3368, respectively. 4463 people on the list had less than 100 citations in 2022, 71512 people had an h22 less than 10, and 313 people had a C-score less than 2. The entry of many authors having single digit H index and a very meager total number of citations indicates serious shortcomings of the c-score-based ranking methodology. These results indicate shortcomings in the ranking methodology.
Steps to reproduce
Although most of the Nobel laureates in science appear in the world ranking career list based on the Scopus database, some of the Nobel laureates in physics and chemistry during the last 25 years did not appear (12 out of 71 in physics, 2 out of 66 in chemistry). All 62 in the medicine category appeared in the list. 4 physics Nobel laureates [Andrea Ghez (2020), Rainer Weiss (2017), Arthur B. McDonald, (2015), and Albert Fert (2007)] have not appeared in the career list, but they were found in Semantic Scholar or Google Scholar. This is probably because their names are not indexed in the Scopus database. A few Nobel laureates belonging to industries (e.g. 2023 Chemistry Nobel laureate Aleksey Yekimov of nanocrystals USA) did not appear in the list. This is probably because of less number of publications due to technology-related work, without a significant number of publications (may have patents). The ranks, total cites, h-index, composite scores, and total publications of most Nobel laureates were much lower as compared to the top 100 rankers in the list. The average number of published papers of the top 100 c-score-based rank holders was much higher than that of the Nobel laureates. Among the top 100 authors in the Stanford list, only four were Nobel laureates in science [Andre Geim (2010 Physics) at 62; John B. Goodenough (2019 Chemistry) at 40; Alan J. Heeger (2000 Chemistry) at 99 and Gregg L. Semenza (2019 Medicine) at 17]. Though misuse of citations in Google Scholar is possible, it is not possible in Scopus. In this way, the Scopus database is authentic, though it does not capture all the citations, because of the non-indexing of many journals by Scopus. Although the c-score-based ranking captures the overall impact of research papers of research scientists, it does not give an accurate account of the impact of original research papers and suffers from the same problems as the citation or H-index-based ranking. The lack of normalization of citations with respect to the total number of publications/co-authors/review papers; non-inclusion of publications in non-Scopus indexed journals, non-mapping of author names in the Scopus database, excessive citations of review papers, extra benefits of first/last and single authors, lack of credit for corresponding authors, anomaly in subject classification, etc., are some of the shortcomings associated with the methodology used in the c-score estimation.  Ioannidis JPA, et al. (2019) PLoS Biol, 17(8), art. no.: e3000384. pmid:31404057  Ioannidis JPA, et al. (2020) PLoS Biol 18(10): e3000918.  J.P.A. Ioannidis, "Updated science-wide author databases of standardized citation indicators" 4 October 2023 (Version 6)