Dataset for Bibliometric Data-Driven Research Team Formation
Description
The dataset contains csv tables regarding the individual expertise and teamwork skills of scholars affiliated with Politehnica University of Timisoara – Romania. The variables included in the dataset were computed from IEEE Xplore bibliometric metadata, extracted on July 4, 2023 using IEEE Xplore Metadata Search API, for the period 2010-2022. A total of 1992 publications, authored by 1179 researchers, have been collected. The domains of expertise are modeled by key terms extracted from paper metadata. In this respect, the dataset contains four collections: a) Case#1 - key terms from publications are extracted from 'title', 'keywords', and 'abstract' metadata fields (6493 key terms); Case#2 - key terms from publications are extracted from 'title' and 'keywords' metadata fields (2651 key terms); Case#3 - key terms from publications are extracted from 'title' metadata fields (1844 key terms); and, Case#4 - key terms from publications are extracted from 'keywords' metadata fields (1254 key terms). Each of these collections contains nine csv tables. For each of the anonymized authors (columns ‘IDx’), table "Individual_Expertise_and_Collaborators" contains the total number of publications, citations, citations in patents, downloads, coauthors with the same affiliation, and, coauthors with other affiliations in lines ‘publications’, ‘citations’, ‘citations_patents’, ‘downloads’, ‘internal_collaborators’, and, ‘external_collaborators’, respectively. Four tables describe the collaborations among pairs of scholars by offering the number of coauthored publications (table "Collaborations_number"), the number of citations received by coauthored publications (table "Collaborations_citations"), the number of citations in patents received by coauthored publications (table "Collaborations_citations_patents), and the number of downloads received by coauthored publications ("Collaborations_downloads"), respectively. The rest of the tables have the same structure, the columns representing the identified key terms and the lines representing the anonymized authors. In these tables, a cell corresponds to the number of publications (table “KeyTerms_number”), the number of citations (table “KeyTerms_citations”), the number of citations in patents (table “KeyTerms_citations_patents”), and the number of downloads (table “KeyTerms_downloads”) received by the publications containing the specified key term, respectively.
Files
Steps to reproduce
On July 4, 2023, we retrieved paper metadata corresponding to publications from the interval 2010-2022, with at least one author from the Politehnica University of Timisoara – Romania, using the IEEE Xplore Matadata Search API. This paper metadata corpus contained 1992 records corresponding to 1179 researchers from the mentioned university. The key terms to derive scholars' domains of expertise were extracted using TagMe entity-linking procedure described by Paolo Ferragina & Ugo Scaiella [https://doi.org/10.1145/1871437.1871689, and, https://doi.org/10.1109/MS.2011.122].