Loyalty Status of Hotel Guests: Simulated Dataset for Predicting Customer Loyalty
Description
This dataset consists of 2,000 simulated records of hotel guests, designed for customer loyalty prediction based on several key features. Each guest's loyalty status ("Yes" for loyal, "No" for non-loyal) indicates their eligibility for loyalty discounts, with "loyal" signifying that the individual can enjoy these discounts. The dataset includes factors such as frequency of bookings, days since the last booking, total revenue generated, average stay duration, and total meal charges. The dataset was generated using Python with the Faker library to create realistic guest names and emails, featuring names in different languages to reflect a diverse international perspective. The countries represented include India, the United States, the United Kingdom, France, Germany, Spain, Italy, Japan, and China. A Python code file is provided alongside this dataset for those interested in the methodology used to create it. This resource serves as a practical tool for exploring machine learning techniques and data analysis in the hospitality sector. We are building this dataset as part of a dynamic pricing project for hotels, aiming to enhance decision-making through customer segmentation.
Files
Steps to reproduce
1. The dataset was synthetically created using Python and the Faker library, which generates fake yet realistic guest information such as names, email addresses, and other attributes. 2. Parameters like frequency of bookings, days since last booking, total revenue generated, and average stay duration were generated using randomization techniques to simulate real-world hotel guest data. 3. The loyalty status ("yes"/"no") was determined based on specific thresholds for booking frequency, days since the last booking, and total revenue generated. 4. The code for generating the dataset can be easily modified for different parameters and scaled to any number of rows. Researchers can reproduce or customize this dataset by adjusting the thresholds and feature generation logic in the Python script.