Disease Mentions

Published: 17 April 2023| Version 2 | DOI: 10.17632/99tkhbwvfg.2
Contributor:
Mark Magumba

Description

The data comprises 5 csv files containing phrases that mention different disease terms. The largest file contains 13004 annotated phrases containing mentions of “influenza”, “flu”, “common cold” and “listeria”. The phrases have been obtained by paraphrasing tweets using the Hugging Face Pegasus transformer neural network model. This is ideally meant to be the training and validation data for creating prospective language models. The other four files contain mentions of “norovirus”, “gastroenteritis” and "stomach flu, “conjunctivitis” and conjunctivitis as “pink eye”. The data could be used to build classifiers for web-based disease surveillance systems

Files

Institutions

Makerere University

Categories

Epidemiology, Health Informatics, Public Health, Information Retrieval, Natural Language Processing, Machine Learning, Information Extraction, Text Mining

Funding

Norwegian program for Development in Higher Education and Research for Development (NORHED)

License