Manually collected Citation Context Datasets from moderately cited Biomedical Publications

Published: 17 May 2021| Version 1 | DOI: 10.17632/ng6zmxt4n4.1
Contributors:
Toluwase Asubiaro,

Description

This dataset consists of citation contexts that were collected from one hundred (100) moderately cited biomedical articles that were sampled from the MEDLINE database. The 100 sampled articles received 7317 citations (maximum=179, minimum=31, average=73.17) from which 11,228 citation contexts were extracted. The citation contexts were manually identified as the span of texts around the citation marker that describes the contribution that was referenced from the cited articles. Unlike other studies that identified citation contexts using a predetermined window of texts before and after the citation marker, citation contexts in this dataset were identified by reading the texts around the citation marker to understand the context in which a cited publication was referenced. Citation contexts data that are collected using a predetermined window of words/paragraphs are prone to errors(overrepresentation or underrepresentation). The first sheet in the excel files contains bibliographic details of the cited article, while the second and third sheets contain bibliographic details of the cited articles and the citation contexts, respectively. Four information types were collected on the third sheet containing the citation contexts: No, title, number of mentions, and citation contexts. "No" refers to a citing article's unique serial number that also corresponds to the serial number of the citing article on the second sheet of the excel file. "Title" refers to the title of the citing article, "number of mentions" connotes the number of times the cited article is referenced in the citing article, and "citation context" is the text that represents the contribution of the cited article. This dataset is useful for citation identification, classification, and weighting studies. This dataset was collected as part of the first author's doctoral thesis.

Files

Institutions

Western University

Categories

Natural Language Processing, Content Analysis, Scholarly Communication, Bibliometrics

Licence