A Dataset of TripAdvisor Guest Reviews for Major Hotels in Salalah, Oman

Published: 24 February 2025| Version 3 | DOI: 10.17632/dkfwj76kx6.3
Contributor:
Ricardo Biason

Description

This dataset contains TripAdvisor guest reviews for major hotels in Salalah, Oman, collected through web scraping. It provides insights into guest satisfaction, sentiment, and ratings, making it a valuable resource for marketing, hospitality and tourism research, sentiment analysis, and tourism marketing studies. ๐‡๐จ๐ญ๐ž๐ฅ๐ฌ ๐ˆ๐ง๐œ๐ฅ๐ฎ๐๐ž๐ ๐ข๐ง ๐ญ๐ก๐ž ๐ƒ๐š๐ญ๐š๐ฌ๐ž๐ญ The dataset features guest reviews from the following hotels in Salalah: โ€ข Al Baleed Resort Salalah by Anantara โ€ข Belad Bont Resort โ€ข Crowne Plaza Resort Salalah โ€ข Fanar Hotel and Residences โ€ข Hilton Salalah Resort โ€ข Juweira Boutique Hotel โ€ข Millennium Resort Salalah โ€ข Salalah Gardens Hotel โ€ข Salalah Rotana Resort ๐“๐ข๐ฆ๐ž ๐‚๐จ๐ฏ๐ž๐ซ๐š๐ ๐ž The dataset captures all available guest reviews from the beginning of each hotel's presence on TripAdvisor up until February 2025. ๐‘๐ž๐ฅ๐ž๐ฏ๐š๐ง๐œ๐ž ๐ญ๐จ ๐Š๐ก๐š๐ซ๐ž๐ž๐Ÿ ๐“๐จ๐ฎ๐ซ๐ข๐ฌ๐ฆ ๐Ž๐ฆ๐š๐ง ๐•๐ข๐ฌ๐ข๐จ๐ง 2040 This dataset is particularly beneficial for the following government agencies: โ€ข Ministry of Heritage and Tourism - Oman โ€ข Oman Chamber of Commerce & Industry (OCCI) โ€ข Dhofar Municipality and Dhofar Tourism Department โ€ข National Centre for Statistics and Information (NCSI) โ€ข Oman Vision 2040 Implementation Follow-up Unit โ€ข Ministry of Commerce, Industry, and Investment Promotion โ€ข Oman Tourism Development Company (OMRAN) โ€ข Ministry of Transport, Communications, and Information Technology (MTCIT) โ€ข Dhofar Governorate Office โ€ข Ministry of Environment and Climate Affairs It also serves as a valuable resource for researchers, policymakers, and marketing, hospitality & tourism professionals to enhance Salalahโ€™s tourism sector, improve guest satisfaction, and support Omanโ€™s long-term vision for a thriving and sustainable tourism industry. Salalah experiences a surge in visitors during the Khareef season (monsoon season), a critical period for the hospitality industry. This dataset can help analyze guest experiences, identify service gaps, and optimize offerings during this peak tourism period. Oman Vision 2040 Goals The dataset aligns with Omanโ€™s Vision 2040, which prioritizes tourism sector growth, economic diversification, and enhanced customer experiences. By leveraging sentiment analysis and guest insights, policymakers and hotel managers can develop data-driven strategies to improve hospitality services, attract more visitors, and enhance Salalahโ€™s reputation as a premier travel destination. Potential Use Cases Sentiment Analysis: Understanding guest satisfaction trends over time Tourism & Hospitality Research: Evaluating service quality and hotel performance across different years Marketing Insights: Identifying key drivers of positive and negative reviews for strategic decision-making Machine Learning & NLP: Training models for text classification, sentiment prediction, and recommendation systems

Files

Steps to reproduce

A structured data extraction and preprocessing workflow has been implemented to systematically collect, transform, and optimize publicly available information from the TripAdvisor website, particularly from the front-end interface. This workflow enables efficient data acquisition for downstream data analytics, sentiment analysis, and predictive modeling. The process begins with the extraction of unstructured review data using the TripAdvisor Review Scraper by ExtensionsBox, which exports the information into a CSV file containing multiple attributes. Additionally, the HTML link of each hotel was used to extract data directly from the hotelโ€™s TripAdvisor pages, ensuring a structured dataset for analysis. The dataset includes nine hotels, classified based on TripAdvisorโ€™s "Best of the Best" Award and "Travelersโ€™ Choice" Award criteria, along with other major hotels, ensuring the dataset focuses on top-rated accommodations in Salalah, Oman. To enable comprehensive analysis, the nine individual datasets were merged into a single combined dataset, allowing for comparative insights across all hotels. To ensure efficient data processing and transformation, Python is utilized alongside the pandas and NumPy libraries. Since the dataset contains personally identifiable information (PII), privacy measures are enforced by systematically deleting sensitive attributes, such as Review ID, User ID, Display Name, Username, User Profile, User Avatar, Photos, and URLs. This step ensures compliance with data privacy best practices while maintaining the integrity and usability of the dataset. Moreover, the "Additional Ratings" columnโ€”originally containing multiple review aspects in a single unstructured fieldโ€”is parsed and transformed into structured features. This transformation involves converting the column into a dictionary-like format and extracting numerical values for Value, Rooms, Location, Cleanliness, Service, and Sleep Quality. To maintain data consistency, missing valuesโ€”where users did not provide ratingsโ€”are replaced with 0, ensuring all ratings are stored as integer data types for uniformity in further analysis. Once the transformation is complete, the original "Additional Ratings" column is dropped, resulting in a structured dataset ready for advanced analytics, including sentiment analysis, machine learning models, and consumer behavior insights.

Categories

Consumer Behavior, Natural Language Processing, Hospitality Management, Market Segmentation, Social Media Analytics, Sentiment Analysis

Licence