PMC clinical trial disentangled tables data set
Description of this data
The database is created by processing 6558 clinical trial articles from PubMed Central public sample 2014. The articles are obtained by matching PMC and Medline documents. The documents that were selected contained in publication type word "Clinical" in Medline.
The documents were processed using TableDisentangler tool, that is able to create the majority of the database. Then documents were annotated using UMLS/MetaMap and script that is a part of TableDisentangler tool for communication with MetaMap. Three case studies were performed for information extraction from these data:
- Extraction of patients' age
- Extraction of gender distribution
- Extraction of FEV1 measures (this has been performed for COPD studies only)
Information extraction case studies were performed using TabInOut tool for generating table information extraction rules.
Database schema can be seen on the following link: https://github.com/nikolamilosevic86/TableDisentangler/wiki/Database-schema
Files included in the dataset:
- Clinicaldata.zip - This file contains raw xml clinical documents from PMC
- Database.zip - Contains database with processed data using TableDisentangler and TabInOut
Experiment data files
This data is associated with the following publication:
Cite this dataset
Milosevic, Nikola (2017), “PMC clinical trial disentangled tables data set”, Mendeley Data, v1 http://dx.doi.org/10.17632/wk53twxddf.1
The files associated with this dataset are licensed under a Creative Commons Attribution 4.0 International licence.