Dataset for Explainable AI-Based Tourism Product Adoption Prediction and Tourist Segmentation Using Digital Engagement Behaviour

Published: 23 May 2026| Version 1 | DOI: 10.17632/zw3ft8r3t6.1
Contributor:
Samuel Mores Geddam

Description

This repository contains materials used in the study titled: “An Explainable Artificial Intelligence Framework for Predicting and Segmenting Tourism Product Adoption Behaviour Using Digital Engagement Analytics.” The repository supports reproducible research in tourism analytics, smart tourism behaviour, explainable artificial intelligence (XAI), digital engagement modelling, and machine learning-driven tourist segmentation. The study investigates how digital engagement behaviour influences tourism product adoption using a combination of predictive analytics, explainable AI techniques, and unsupervised behavioural clustering. The analytical framework integrates Logistic Regression, Random Forest Classification, SHAP (SHapley Additive Explanations), K-Means Clustering, and Principal Component Analysis (PCA) to model tourism product adoption behaviour and identify distinct tourist behavioural archetypes. The study demonstrates how behavioural engagement variables, including travel page interactions, social engagement activities, travel check-ins, browsing duration, and tourism-related digital participation, can effectively predict tourism purchase decisions and reveal heterogeneous tourist segments. The repository includes: • Origina datasets • Cleaned datasets, • preprocessing scripts, • exploratory data analysis outputs, • predictive modelling scripts, • explainable AI analysis, • clustering and segmentation analysis, • feature importance outputs, • visualization files, • reproducibility environment reports, • supplementary research materials. The datasets used in this study were acquired from publicly available Kaggle repositories and subsequently cleaned, transformed, and analysed for research purposes. Primary behavioural engagement dataset source: Customer Behaviour Tourism Dataset Source: Kaggle Available at: https://www.kaggle.com/code/ddosad/tourism-website-engagement-eda/notebook Customer Behaviour Tourism Dataset Tourism package adoption dataset source: Tour Package Prediction Dataset Source: Kaggle Available at: https://www.kaggle.com/code/yogidsba/travelpackageprediction-ensemble-techniques/notebook Tour Package Prediction Dataset The datasets contain behavioural, demographic, and tourism engagement variables associated with: • tourism website interaction, • travel content engagement, • tourism social activity, • outstation travel behaviour, • tourism package adoption, • tourist interaction intensity, • travel frequency, • tourism-related digital participation. The repository is intended to support future research in: • tourism analytics, • explainable AI applications in tourism, • smart tourism systems, • tourism consumer behaviour, • digital tourism marketing, • machine learning reproducibility, • tourism segmentation and personalization. All analyses were conducted using Python within the JupyterLab environment using libraries including pandas, numpy, scikit-learn, matplotlib, seaborn, and SHAP.

Files

Steps to reproduce

Download the original datasets: Customer Behaviour Tourism Dataset Tour Package Prediction Dataset Install Python 3.10+, JupyterLab, and the required libraries: pandas numpy scikit-learn matplotlib seaborn shap Execute the notebooks/scripts sequentially: Data Cleaning and Preprocessing Exploratory Data Analysis (EDA) Logistic Regression Random Forest Classification SHAP Explainability Analysis K-Means Clustering and PCA Visualization Generated outputs, including feature importance results, clustering outputs, visualizations, and reproducibility reports, will be automatically saved in the results directory. All analyses were conducted using Python in the JupyterLab environment on a Windows-based system.

Institutions

Categories

Tourism, Demographics, Market Segmentation

Licence