Paper_IJPE_Repository_1_Dataset_Purchases_Original_and_Augmented

Published: 24 February 2025| Version 2 | DOI: 10.17632/24j2xp2xvy.2
Contributor:
SAMIA GAMOURA

Description

This repository contains two datasets: Original Dataset (100 rows) – A manufacturer-provided dataset of purchased items. Augmented Dataset (10,000 rows) – A synthetically generated dataset designed for use in the FP-Growth algorithm to extract risk interdependency rules. The augmentation process was performed using a Synthetic Data Generation technique based on Probabilistic Distribution, ensuring that newly generated categorical values align with the original data’s probability distribution. To maintain logical consistency, the algorithm leverages conditional probability distributions to preserve attribute relationships and dependencies. This approach guarantees realistic, coherent, and statistically valid synthetic data.

Files

Steps to reproduce

Steps of the algorithm: 1. Reads the original CSV file (100 rows). 2. Extracts categorical distributions for each attribute. 3. Identifies pairwise dependencies (e.g., Quality vs. Price, Finances vs. Risk Flag). 4. Uses a Conditional Probability Model. Uses joint probability tables (P(A, B)) to improve relationship accuracy. 5. Generates 10,000 synthetic rows that preserve category distributions and dependencies. 6. Saves the synthetic dataset to a new CSV file.

Institutions

EM Strasbourg Business School

Categories

Point of Purchase Promotion, Supplier Selection

Licence