Indonesian Travel Reviews for Text Summarization

Published: 13 August 2023| Version 1 | DOI: 10.17632/x2r86kfrhp.1
Narandha A Ranggianto, Diana Purwitasari, Chastine Fatichah


The dataset is intended for summarizing texts that can be used in extractive or abstractive approaches. This data is from year 2018-2022 and has three categories attraction, hotel, and restaurant. Each category consists of 100 different objects, resulting in a total of 300 objects across all categories. Each object has 5 reviews and 1 ground truth. The ground truth is a summary reference created by 3 experts, with 2 individuals holding bachelor's degrees in Indonesian Language and Literature Education and having worked as teachers for more than 2 years. The remaining person holds a bachelor's degree in Indonesian Literature and has 2 years of experience as an NLP annotator. Each category folder, such as the 'attraction' folder, contains 4 subfolders, each of which holds 25 objects.



Information Retrieval, Natural Language Processing, Text Mining