PICO Statements Dataset

Published: 18 February 2020| Version 1 | DOI: 10.17632/p5rbn8mygp.1


The PICO Statements dataset is a collection of 130 abstracts from Randomized Clinical Trials and Controlled Trials, manually annotated by medical practitioners, to identify sentences that not only contain all four PICO elements but also answer clinically stated questions. These sentences are referred to as PICO Statements. In Evidence-Based Medicine (EBM), the PICO framework is used by medical practitioners to narrow the search space and enable faster decision-making towards treatment procedures. The framework is named after the four elements that comprise it, Population, Intervention, Comparator and Outcome. Previous datasets focus on identifying either whole sentence to a single PICO element or, more recently, the sequence of tokens in the sentence that describe each element. Similar to previous research, we consider Intervention and Comparator as one element in our annotation scheme. For each sentence, we binary annotate the existence of each PICO element individually and if the sentence is a PICO Statement. The dataset is offered, in an abstract per file manner, in two formats: 1) XML format, for sentence classification. The XML format present each abstract, along with its title, annotated on a sentence level, with all four annotations present for each sentence in a binary format. The XML Schema (.xsd) files are also available in the miscellaneous folder. 2) pseudo-IOB format, for PICO entity prediction. The pseudo-IOB format, presents each abstract, along with its title, annotated on a token level, with the same binary annotations repeating for each token in the sentence. The binary annotations in the pseudo-IOB format are corresponding to the PICO elements in the following order: Population, Intervention/Comparator, Outcome, PICO Statement. In both annotation schemes contain the same abstracts and the file names are corresponding to the PubMedIDs of the publications from which the abstracts originate.



Aristotle University of Thessaloniki


Evidence-Based Medicine, Natural Language Processing, Machine Learning, Clinical Decision Making