Indonesian Biodiversity-related Tweets Including Health, Food Security, and Environmental Management Issues for Sentiment Analysis

Published: 16 August 2023| Version 3 | DOI: 10.17632/xtk9wsxjjr.3
Contributors:
Mohammad Teduh Uliniansyah, Agung Santosa, Andi Djalal Latief, Asril Jarin, Dian Isnaeni Nurul Afra, Elvira Nurfadhilah, Gunarso Gunarso, indra budi, Nuraisa Novia Hidayati, Radhiyatul Fajri, Ryan Randy Suryono,, Siska Pebiana, Siti Shaleha, Tosan Wiar Ramdhani, Tri Sampurno, Meganingrum Arista Jiwanggi, M. Irfan Raif, Tri Nanda

Description

The dataset was gathered using Twitter API services for around 30 particular biodiversity-related keywords with dates ranging from January 2020 to March 2023. This data was then refined by filtering out irrelevant information, including non-Indonesian language content, non-Biodiversity data, spam, and duplicate entries. Independent analysts undertook the task of manually assigning sentiment labels to the dataset. These eighteen individuals consisted of twelve researchers and engineers specializing in natural language processing, of which two held Ph.D. degrees, nine had MSc degrees, and one had a BSc degree. Additionally, four lecturers and two experts in natural language processing, each with a Ph.D. or MSc degree, contributed to the labeling process. The sentiments were divided into three classes, and the principle of majority voting determined the final class label.

Files

Steps to reproduce

* Collecting data can be done by referring to tweet IDs in the file biodiversity_raw.csv. * We added 1st annotator label, 2nd annotator label, 3rd annotator label, and the final label in the file biodiversity_labeled.csv so that users may compare their labels with ours in the file. * For creating a model, based on our experiments, the best model was the model created from IndoBert Tweet, which can be downloaded from the Hugging Face site. * We are drafting a paper titled "Twitter Dataset on Public Sentiments Towards Biodiversity Policy in Indonesia." Should there be any problems, users may refer to this paper.

Institutions

  • Badan Pengkajian da Penerapan Teknologi
  • Universitas Indonesia

Categories

Social Sciences, Computer Science, Natural Language Processing, Biodiversity, Text Mining

Funders

Licence