Dataset of scholarly publications for empirical evaluation of research productivity and trends of the ISI over two decades (1991-2010)

Published: 26-11-2019| Version 1 | DOI: 10.17632/6m2s6rt48x.1
Jiban K. Pal


Quantification of research performance is an obvious necessity of scientific institutions and enterprises for many academic pursuits. Scientometric measurements are indeed recognized as an indispensable tool for evidential judgment of research-activities performed by an individual or institute as well. Scholarly publications have been found always the most acceptable basis of evaluating research productivity, often combining with citation counts. This dataset enumerates on quantifiable characteristics of scholarly publications of the Indian Statistical Institute (ISI) from1991 to 2010. It provides thorough documentation of the publications along with their citations for mapping the research of ISI over two distinct decades. It also presents the publications in different dimensions for empirical evaluation of research productivity and trends correlating the citation impact of the Institute. The exploration of this publication dataset was time-consuming and made rigorously. Scholarly publications having (at least) an author affiliation of ISI (in the by-line) appeared during the period were first retrieved from numerous sources (viz. Scopus®, Web of Science™, MathSciNet®, EconLit), then converted and captured in MS-Excel format. The technique of data conversion (from BibTeX to Excel, through CDS/ISIS using Fangorn) was extremely helpful to capture large amounts of data at the least possible time. However, the strenuous efforts were made for data filtration and validation through annual reports of the Institute. Validation of the dataset was no doubt a tedious job, but it gave an amazing experience on how to make the data-elements findable and accessible using robust techniques. A tiresome job was performed to count the citations (as in December 2017) for measuring the academic influence of ISI publications through Google-Scholar, Scopus®, and Web of Science™. The empirical dataset (comprising 7188 records) finally consolidated and organized systematically for sharing and reuse. It includes ‘raw data’ prepared for the doctoral dissertation work. Detailed description of the data collection (i.e. identification, gathering, conversion, filtering, validation, and consolidating) is provided in the Dissertation work (Chapter-4). It has also been used by the researcher in some of his articles. The dataset is, therefore, authoritative for conducting studies on different aspects of evaluative scientometrics. The author firmly believes that potential researchers would find the benefits from this interesting dataset, and will be used (re-used) further widening the coverage of research to develop better insights.


Steps to reproduce

The dataset can be downloaded in CSV (Comma Separated Values, as plain text) and XLSX (Microsoft Excel worksheet, as binary) format for academic/research uses. Besides the dataset, two separate sheets of ‘guide-to-use’ and ‘authorship counting chart’ have been provided in the Excel file. It can’t be used to relate any personal treatment or psychosocial intervention. The dataset, however, refers to a statement that does not allow third-parties (except academic researchers) to use it purposefully and without prior consultation to the Author. Indeed no part of this dataset can be used anyway ignoring the spirit of the researcher and subsequent academic interest therein.