RDD corpus: An annotated corpus relating disabilities and rare diseases

Published: 24-07-2018| Version 5 | DOI: 10.17632/gs2rs3z3nv.5
Hermenegildo Fabregat Marcos,
Lourdes Araujo,
Juan Martínez Romo


There is a huge amount of rare diseases, many of which have associated important disabilities. It is paramount to know in advance the evolution of the disease in order to limit and prevent the appearance of disabilities and to prepare the patient to manage the future difficulties. Rare disease associations are making an effort to manually collect this information, but it is a long process. A lot of information about the consequences of rare diseases is published in scientific papers, and could be automatically extracted from them. This is a new corpus of abstracts from scientific papers related to rare diseases, which has been manually annotated with disabilities. This corpus will allow training machine learning systems that can automatically process other papers, thus extracting new information about the relations between rare diseases and disabilities. The corpus is also annotated with negation and speculation when they appear affecting disabilities.