Integrated Multi-Source, Multi-Country Retail Sales Dataset with Engineered Seasonal Features

Published: 27 April 2026| Version 1 | DOI: 10.17632/v8h8wn4w37.1
Contributor:
Yi Xuan Phung

Description

This dataset is a processed and integrated retail sales dataset constructed from multiple public and private data sources across different regions. It includes data from the United Kingdom, Malaysia, and selected European countries (France, Spain, Portugal, Germany), combining both real-world and synthetic retail datasets. The dataset contains approximately 70,000 records and provides transaction-level and aggregated sales information, including date, product, category, price, quantity, and revenue. Data from heterogeneous sources were cleaned, standardized, and merged into a unified schema to ensure structural consistency across different datasets. Additional temporal and seasonal features were engineered to support time-series analysis, including weekday and weekend indicators, month identifiers, day-of-week values, country-specific holiday flags, and seasonal labels. These features capture both calendar-based effects such as holidays & seasonal cycles and region-specific consumption patterns. The dataset is designed to represent diverse retail environments and heterogeneous data conditions, including variations in product categories, regional behavior, and data granularity. It can be reused for tasks such as time-series modeling, feature engineering, benchmarking, and cross-regional analysis under varying seasonal conditions.

Files

Institutions

Categories

Time Series Analysis, Multivariate Analysis, Sales Forecasting, Retail Sector, Restaurant

Licence