An annotated dataset for gene-melanoma relation extraction from scientific literature

Published: 30 August 2022| Version 1 | DOI: 10.17632/745bpf597f.1
Contributor:
Roberto Zanoli

Description

Melanoma is the least common but the deadliest of skin cancers. This cancer begins when the genes of a cell suffer damage or fail, and identifying the genes involved in melanoma is crucial for understanding the melanoma tumorigenesis. To date, machine learning for gene-melanoma relation extraction from text has been limited by the lack of annotated resources. To overcome this problem, we have exploited the information of the Melanoma Gene Database (a manually curated database of human melanoma related genes) to build an annotated dataset of binary relations between genes and melanoma entities mentioned in PubMed abstracts. The exploitability of the dataset was tested with both traditional machine learning, and neural network-based models. These models are then used to automatically extract gene-melanoma relations from the biomedical literature. Researchers can use the annotated dataset to develop and compare their own models. Moreover, the relations extracted from the literature can be integrated with existing structured knowledge to facilitate researchers in their data search.

Files

Categories

Natural Language Processing

Licence