Indonesian Biodiversity-related Tweets Including Health, Food Security, and Environmental Management Issues for Sentiment Analysis

Name: Indonesian Biodiversity-related Tweets Including Health, Food Security, and Environmental Management Issues for Sentiment Analysis
Creator: Mohammad Teduh Uliniansyah
Published: 2023-10-17T05:40:53.280Z
Keywords: Social Sciences, Computer Science, Natural Language Processing, Biodiversity, Text Mining

Uliniansyah, Mohammad Teduh; Santosa, Agung; Latief, Andi Djalal; Jarin, Asril; Afra, Dian Isnaeni Nurul; Nurfadhilah, Elvira; Gunarso, Gunarso; budi, indra; Hidayati, Nuraisa Novia; Fajri, Radhiyatul; Suryono,, Ryan Randy; Pebiana, Siska; Shaleha, Siti; Ramdhani, Tosan Wiar; Sampurno, Tri; Jiwanggi, Meganingrum Arista; Raif, M. Irfan; Nanda, Tri

doi:10.17632/xtk9wsxjjr.4

Indonesian Biodiversity-related Tweets Including Health, Food Security, and Environmental Management Issues for Sentiment Analysis

Published: 17 October 2023| Version 4 | DOI: 10.17632/xtk9wsxjjr.4

Contributors:

Mohammad Teduh Uliniansyah, Agung Santosa, Andi Djalal Latief, Asril Jarin, Dian Isnaeni Nurul Afra, Elvira Nurfadhilah, Gunarso Gunarso, indra budi, Nuraisa Novia Hidayati, Radhiyatul Fajri, Ryan Randy Suryono,, Siska Pebiana, Siti Shaleha, Tosan Wiar Ramdhani, Tri Sampurno, Meganingrum Arista Jiwanggi, M. Irfan Raif, Tri Nanda

Description

The dataset was gathered using Twitter API services for around 30 particular biodiversity-related keywords with dates ranging from January 2020 to March 2023. This data was then refined by filtering out irrelevant information, including non-Indonesian language content, non-Biodiversity data, spam, and duplicate entries. Independent analysts undertook the task of manually assigning sentiment labels to the dataset. These eighteen individuals consisted of twelve researchers and engineers specializing in natural language processing, of which two held Ph.D. degrees, nine had MSc degrees, and one had a BSc degree. Additionally, four lecturers and two experts in natural language processing, each with a Ph.D. or MSc degree, contributed to the labeling process. The sentiments were divided into three classes, and the principle of majority voting determined the final class label.

Files

Steps to reproduce

* Collecting data can be done by referring to tweet IDs in the file biodiversity_raw.csv. * The file biodiversity_labeled.csv contained 1st annotator label, 2nd annotator label, 3rd annotator label, and the final label, so that users may compare their labels with ours in the file. * For creating a model, based on our experiments, the best model was the model created from IndoBert Tweet, which can be downloaded from the Hugging Face site. * We are drafting a paper titled "Twitter Dataset on Public Sentiments Towards Biodiversity Policy in Indonesia." Should there be any problems, users may refer to this paper.

Indonesian Biodiversity-related Tweets Including Health, Food Security, and Environmental Management Issues for Sentiment Analysis

Description

Files

Steps to reproduce

Institutions

Categories

Funders

Licence