PMC clinical trial disentangled tables data set

Published: 19 May 2017| Version 1 | DOI: 10.17632/wk53twxddf.1
Contributor:
Nikola Milosevic

Description

The database is created by processing 6558 clinical trial articles from PubMed Central public sample 2014. The articles are obtained by matching PMC and Medline documents. The documents that were selected contained in publication type word "Clinical" in Medline. The documents were processed using TableDisentangler tool, that is able to create the majority of the database. Then documents were annotated using UMLS/MetaMap and script that is a part of TableDisentangler tool for communication with MetaMap. Three case studies were performed for information extraction from these data: - Extraction of patients' age - Extraction of gender distribution - Extraction of FEV1 measures (this has been performed for COPD studies only) Information extraction case studies were performed using TabInOut tool for generating table information extraction rules. Database schema can be seen on the following link: https://github.com/nikolamilosevic86/TableDisentangler/wiki/Database-schema Files included in the dataset: - Clinicaldata.zip - This file contains raw xml clinical documents from PMC - Database.zip - Contains database with processed data using TableDisentangler and TabInOut

Files

Institutions

The University of Manchester

Categories

Data Mining, Clinical Trials, Descriptive Tables, Data Processing, Text Mining

Licence