A corpus for mining drug-related knowledge from Twitter chatter: Language models and their utilities

Published: 17 Jul 2017 | Version 3 | DOI: 10.17632/dwr4xn8kcv.3
Contributor(s):

Description of this data

Language models. As described in the publication titled above.
DSM-langauge-models-3M-LARGE is generated from over 3M posts using window size 5 and dimension 400.

**USE THIS**: DSM-language-model-1B-LARGE is generated from ~ 1B tweets from user timelines where at least 1 medication is mentioned. This model is an n-gram model.

Experiment data files

Latest version

  • Version 3

    2017-07-17

    Published: 2017-07-17

    DOI: 10.17632/dwr4xn8kcv.3

    Cite this dataset

    Sarker, Abeed; Gonzalez, Graciela (2017), “A corpus for mining drug-related knowledge from Twitter chatter: Language models and their utilities”, Mendeley Data, v3 http://dx.doi.org/10.17632/dwr4xn8kcv.3

Previous versions

Compare to version

Categories

Social Networking Service, Drug Adverse Reactions, Language Modeling, Pharmacovigilance

Licence

CC BY 4.0 Learn more

The files associated with this dataset are licensed under a Creative Commons Attribution 4.0 International licence.

What does this mean?

Unless indicated otherwise, you can share, copy and modify the images or other third party material in this article so long as you give appropriate credit, provide a link to the license, and indicate if changes were made. If the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material.

Report