Tourism Dataset

Published: 29 August 2023| Version 1 | DOI: 10.17632/h58s544674.1
choirul huda huda


This dataset contains tourist visit transactions at three popular tourist locations in Indonesia, namely Bali, Malang, and Yogyakarta. The data contains the historical tourism experiences recorded on the TripAdvisor website that have been collected from October 2022 to January 2023. This dataset is provided to support the researchers, especially in the field of tourism and soft computing, and not for commercialization purposes.


Steps to reproduce

A series of activities have been carried out to empower the information available on the TripAdvior website, which is available to the public, especially in the frontend area. These series of activities aim to be able to provide adequate datasets to assist researchers in conducting various studies. Main steps in conducting the dataset development include crawling, pre-processing, and modelling. Regarding the first step in developing the dataset, a crawl module has been carried out via the WebHarvy crawl module to collect tourism information on the TripAdvisor website for the public. The information collected has an unstructured data format and consists of mixed data subjects that need to be normalized. The raw data has transformed into excel data format for further data analysis before conducting the pre-processing. Pre-processing has been conducted to avoid datasets from empty values, irrelevant values, and duplication values. The dataset consists of nine tables, namely: User, Item, Transaction, Continent, Region, Country, City, Mode, and Type. Regarding the research readiness in soft computing and confidentiality, the data values have been encoded. For the dataset enhancement, some additional data and data validation have been provided through the manual searching data through Google Maps regarding the information of locations namely Continent, Region, Country, and City based on the value of raw data from TripAdvisor. The provision of a relational data model has been considered to assist researchers in understanding the relationships between data entities and related explanations in other entities.


Bina Nusantara University


Tourism, Soft Computing