DrugSemantics Gold Standard

Published: 16 Jun 2017 | Version 1 | DOI: 10.17632/fwc7jrc5jr.1
Contributor(s):

Description of this data

DrugSemantics gold standard consists of 5 Summaries of Product Characteristics (SPC) written in Spanish. SPCs were retrieved from Medicines Online Information Center - CIMA - that belongs to the Spanish Agency for Medicines and Health Products - AEMPS.

This corpus is annotated with 10 Named Entities (NE) related to pharmacotherapeutic care, namely: Chemical Composition, Disease, Drug, Excipient, Food, Medicament, Pharmaceutical Form, Route, Therapeutic Action and Unit of Measurement. It contains 2241 ENs, 780 sentences and 226,729 tokens.

The zip file is organized as follows: Each SPC is in a separte folder containing one xml file that contains the annotated documents in Gate Standoff format.

DrugSemantics was designed to be used for developing and testesting of Spanish NE recogniton tools in the pharmacotherapeutic domain.

Experiment data files

Related links

This data is associated with the following publication:

DrugSemantics: A corpus for Named Entity Recognition in Spanish Summaries of Product Characteristics

Published in: Journal of Biomedical Informatics

Latest version

  • Version 1

    2017-06-16

    Published: 2017-06-16

    DOI: 10.17632/fwc7jrc5jr.1

    Cite this dataset

    Moreno, Isabel; Boldrini, Ester; Moreda, Paloma; Romá-Ferri, M. Teresa (2017), “DrugSemantics Gold Standard”, Mendeley Data, v1 http://dx.doi.org/10.17632/fwc7jrc5jr.1

Statistics

Views: 119
Downloads: 26

Institutions

Universitat d'Alacant Departament de Llenguatges i Sistemes Informatics, Universitat d'Alacant Facultat de Ciencies de la Salut

Categories

Drugs, Disease, Annotation, Natural Language Processing, Information Extraction, Chemical Compound, Spanish Language, Human Language Resources, Excipient, Medication

Licence

CC BY NC 3.0 Learn more

The files associated with this dataset are licensed under a Attribution-NonCommercial 3.0 Unported licence.

What does this mean?

You are free to adapt, copy or redistribute the material, providing you attribute appropriately and do not use the material for commercial purposes.

Report