MedNER: Covid-19 drug disease named entity recognition dataset from scientific articles

Published: 12 December 2022| Version 1 | DOI: 10.17632/46p8vp9xts.1
Contributors:
M Saef Ullah Miah, Junaida Sulaiman, Talha Bin Sarwar

Description

This dataset contains drug and disease-named entity data for COVID-19-related texts from published papers in IOB format. This dataset has been annotated and verified by domain experts. This dataset has 26528 rows and 3 columns. The first column represents the sentence ID, the second column represents the tokens, and the third column contains the drug or disease tag. This dataset contains data for five annotated documents among 25 documents.

Files

Steps to reproduce

1. Collect data from renowned publishers and get by the highest citation 2. Annotate by the domain expert 3. Validate by the domain expert

Institutions

Universiti Malaysia Pahang

Categories

Natural Language Processing, Computer-Aided Design for Biomedical Application

Licence