MedNER: Covid-19 drug disease named entity recognition dataset from scientific articles

Published: 12 December 2022| Version 1 | DOI: 10.17632/46p8vp9xts.1
M Saef Ullah Miah, Junaida Sulaiman, Talha Bin Sarwar


This dataset contains drug and disease-named entity data for COVID-19-related texts from published papers in IOB format. This dataset has been annotated and verified by domain experts. This dataset has 26528 rows and 3 columns. The first column represents the sentence ID, the second column represents the tokens, and the third column contains the drug or disease tag. This dataset contains data for five annotated documents among 25 documents.


Steps to reproduce

1. Collect data from renowned publishers and get by the highest citation 2. Annotate by the domain expert 3. Validate by the domain expert


Universiti Malaysia Pahang


Natural Language Processing, Computer-Aided Design for Biomedical Application