Emoji Extractions from Geotagged Twitter Data

Name: Emoji Extractions from Geotagged Twitter Data
Creator: Mayank Kejriwal
Published: 2020-12-18T10:54:00.246Z
Keywords: Social Sciences

Kejriwal, Mayank

doi:10.17632/zw6ypk5345.1

Emoji Extractions from Geotagged Twitter Data

Published: 18 December 2020| Version 1 | DOI: 10.17632/zw6ypk5345.1

Contributor:

Mayank Kejriwal

Description

We provide a filtered, pre-processed and anonymized dataset collected originally from the Twitter decahose (a random 10% sample of Twitter) over 29 days in October, 2016, to support computational social science research on how people on Twitter use emojis. The data comprises of a table with four columns and 4,057,872 rows (including a header). The fields of the table are: ID: A unique tweet ID that could also be used to ‘hydrate’ the contents of the tweet directly from Twitter. Country: The country code associated with the tweet. Language: The language metadata tag associated with the tweet. We only retain the top 30 languages (sorted by frequency of tweets) in our initial corpus. Emojis: The emojis extracted from the ‘text’ field of the tweet. The methodology that was used to extract the tweets is described in the next section. A paper that uses the data as the basis for its findings and also contains descriptive statistics: M. Kejriwal, Q. Wang, H. Li, L. Wang, An Empirical Study of Emoji Usage on Twitter in Linguistic and National Contexts. Online Social Networks and Media. In Press.

Files

Institutions

University of Southern California

Emoji Extractions from Geotagged Twitter Data

Description

Files

Institutions

Categories

Licence