CreditTransAct: A Profile-Driven Dataset for Scalable Credit Card Fraud Detection
Description
A large-scale synthetic dataset of 15 million credit card transactions is generated for fraud detection research. It is divided into four segments: A_Established (40%), B_Regular (30%), C_New (20%), and D_Guest (10%), with segment-wise fraud rates of 2.5%, 4.0%, 7.0%, and 12.0% respectively. Overall, 95.10% of transactions are legitimate and 4.90% are fraudulent. Segments A, B, and C are based on persistent customer profiles that include behavioral attributes such as baseline spending, usual merchant category, account age, and credential change history. Segment D represents anonymous transactions generated from global distributions without historical customer context. Each record includes 33 features covering transaction amount, geographic behavior, device and network signals, authentication details, and behavioral patterns. Fraud cases are generated using four compound signal clusters: Card-Not-Present, Bot/Card Testing, Account Takeover, and Geo-Velocity. Controlled label noise is introduced in borderline cases to simulate real-world uncertainty. The dataset is generated using a reproducible and memory-efficient pipeline built with Python, NumPy, Pandas, and PyArrow. It is provided in Snappy-compressed Parquet format for efficient storage and in CSV format for easy accessibility. This dataset is suitable for understanding and assessing fraud behavior in realistic financial transaction settings.
Files
Institutions
- Chittagong University of Engineering & TechnologyChittagong, Chittagong