COFINFAD: Colombian Fintech Financial Analytics Dataset
Description
Behavioral and transactional data from 48,723 customers of a Colombian fintech company, collected over 12 months from January 4, 2023, to December 29, 2023. Comprises 3,159,157 individual transactions and it's designed to support research on customer retention, financial behavior analysis, and digital financial service adoption in Latin American emerging markets. - 57 variables covering demographics, transaction patterns, product adoption, and customer satisfaction - Transaction values in Colombian Pesos (COP) - Customer satisfaction scores on a 6-point scale - Net Promoter Score (NPS) data - Application usage metrics - Churn probability indicators - Geographic distribution across Colombian cities For machine learning applications, this dataset is also available on Hugging Face: https://huggingface.co/datasets/luisdavidtrejosrojas/cofinfad
Files
Steps to reproduce
Cannot be reproduced exactly due to its observational nature. The following describes the collection methodology: - Daily automated API extraction from company's CRM system and transaction databases using Python scripts with requests library during low-traffic hours, capturing 42 transaction-related variables including amounts, timestamps, transaction types, merchant categories, and completion status - Quarterly in-app survey deployment (March, June, September, December 2023) achieving 14.3% average response rate, collecting satisfaction metrics through 5 core questions on 1-6 scale plus product-specific evaluations and 3 open-ended feedback questions - Application usage analytics through integration with company's existing mobile analytics infrastructure, capturing user session metrics (duration, frequency, time patterns), feature interaction data, screen views, button interactions, and performance metrics - Data validation implementing record count comparison with source system logs, removal of test accounts and internal transactions, duplicate record elimination, statistical outlier analysis, and business rule verification - Anonymization procedures following Colombian data protection regulations (Ley 1581 de 2012) including removal of personal identifiers, use of non-reversible unique codes, geographic aggregation to city level, and privacy-preserving data transformations - Data integration consolidating multiple sources using customer identifiers with referential integrity validation, format standardization across all data types and structures, and systematic alignment of temporal data from different collection methods - Calculation of derived metrics including customer lifetime value, behavioral indicators, satisfaction score aggregation from survey responses, transaction frequency patterns, and churn probability predictions using 30-day forecast window
Institutions
- Universidad Tecnologica de Pereira