MedNorm: A Corpus and Embeddings for Cross-terminology Medical Concept Normalisation

Published: 3 Jun 2019 | Version 1 | DOI: 10.17632/b9x7xxb9sz.1

Description of this data

MedNorm is a corpus of 27,979 textual descriptions simultaneously mapped to both MedDRA and SNOMED-CT, sourced from five publicly available datasets across biomedical and social media domains.
The cross-terminology medical concept embeddings are 64-dimensional vectors for UMLS, MedDRA and SNOMED-CT concepts that are able to capture semantic similarities between concepts from different medical terminologies.

For more details see paper entitled "MedNorm: A Corpus and Embeddings for Cross-terminology Medical Concept Normalisation"

Experiment data files

Related links

Latest version

  • Version 1


    Published: 2019-06-03

    DOI: 10.17632/b9x7xxb9sz.1

    Cite this dataset

    Belousov, Maksim; Dixon, William G.; Nenadic, Goran (2019), “MedNorm: A Corpus and Embeddings for Cross-terminology Medical Concept Normalisation”, Mendeley Data, v1


Views: 230
Downloads: 37


The University of Manchester


Epidemiology, Health Informatics, Social Media, Data Science, Drug Adverse Reactions, Natural Language Processing, Machine Learning, Pharmacovigilance, Medication, Text Mining, Medical Terminology, Twitter


CC BY NC 3.0 Learn more

The files associated with this dataset are licensed under a Attribution-NonCommercial 3.0 Unported licence.

What does this mean?
You are free to adapt, copy or redistribute the material, providing you attribute appropriately and do not use the material for commercial purposes.