INDONESIAN HOAX NEWS DETECTION DATASET

Published: 15 October 2018| Version 1 | DOI: 10.17632/p3hfgr5j3m.1
Contributors:
FAISAL RAHUTOMO, INGGRID YANUAR,

Description

This dataset is part of this study: I. Y. R. Pratiwi, R. A. Asmara and F. Rahutomo, "Study of hoax news detection using naïve bayes classifier in Indonesian language," 2017 11th International Conference on Information & Communication Technology and System (ICTS), Surabaya, 2017, pp. 73-78. doi: 10.1109/ICTS.2017.8265649 keywords: {Bayes methods;Internet;natural language processing;pattern classification;Web sites;naïve bayes classifier;Indonesian language;online news articles;internet;websites;hoax recall;hoax precision;hoax news article detection;automatic hoax news detection;fake news article;search news;Information and communication technology;Google;Uniform resource locators;Hoax news detection;dataset;naïve bayes classifier}, URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8265649&isnumber=8265591 The dataset contains five files: 1. 250 news with valid hoax label.csv The file contains 250 Indonesian news texts with hoax/ valid label 2. 250 news labelling process.xlsx The file contains information on the labelling process of 10 news topics. The news links are labelled by three referees wether it is hoax or valid. The final label is derived by voting procedure of the three referees. 3. 250 news experiment documentation.xlsx The file contains experiments documentation on the dataset by NBC classifier. The experiments were conducted three times with different training/ testing size. The schemes of training-testing are: 60-40, 70-30, and 80-20. 4. 600 news with valid hoax label.csv The file contains 600 Indonesian news texts with hoax/ valid label 5. 600 news labelling process.xlsx The file contains information on the labelling process of 12 news topics. The news links are labelled by three referees wether it is hoax or valid. The final label is derived by voting procedure of the three referees.

Files

Steps to reproduce

Unzip the file and reproduce the data from csv or xlsx files.

Categories

Indonesian Language, Text Mining

Licence