MedNorm: A Corpus and Embeddings for Cross-terminology Medical Concept Normalisation

Published: 3 June 2019| Version 1 | DOI: 10.17632/b9x7xxb9sz.1
Contributors:
Maksim Belousov, William G. Dixon, Goran Nenadic

Description

MedNorm is a corpus of 27,979 textual descriptions simultaneously mapped to both MedDRA and SNOMED-CT, sourced from five publicly available datasets across biomedical and social media domains. The cross-terminology medical concept embeddings are 64-dimensional vectors for UMLS, MedDRA and SNOMED-CT concepts that are able to capture semantic similarities between concepts from different medical terminologies. For more details see paper entitled "MedNorm: A Corpus and Embeddings for Cross-terminology Medical Concept Normalisation"

Files

Institutions

  • The University of Manchester

Categories

Epidemiology, Health Informatics, Social Media, Data Science, Drug Adverse Reactions, Natural Language Processing, Machine Learning, Pharmacovigilance, Medication, Text Mining, Medical Terminology, Twitter

Licence