COFINFAD: Colombian Fintech Financial Analytics Dataset

Published: 6 June 2025| Version 1 | DOI: 10.17632/mhb4zn3258.1
Contributors:
, Yony Fernando Ceballos, Luis David Trejos Rojas

Description

Behavioral and transactional data from 48,723 customers of a Colombian fintech company, collected over 12 months from January 4, 2023, to December 29, 2023. Comprises 3,159,157 individual transactions and it's designed to support research on customer retention, financial behavior analysis, and digital financial service adoption in Latin American emerging markets. - 57 variables covering demographics, transaction patterns, product adoption, and customer satisfaction - Transaction values in Colombian Pesos (COP) - Customer satisfaction scores on a 6-point scale - Net Promoter Score (NPS) data - Application usage metrics - Churn probability indicators - Geographic distribution across Colombian cities For machine learning applications, this dataset is also available on Hugging Face: https://huggingface.co/datasets/luisdavidtrejosrojas/cofinfad

Files

Steps to reproduce

Cannot be reproduced exactly due to its observational nature. The following describes the collection methodology: - Daily automated API extraction from company's CRM system and transaction databases using Python scripts with requests library during low-traffic hours, capturing 42 transaction-related variables including amounts, timestamps, transaction types, merchant categories, and completion status - Quarterly in-app survey deployment (March, June, September, December 2023) achieving 14.3% average response rate, collecting satisfaction metrics through 5 core questions on 1-6 scale plus product-specific evaluations and 3 open-ended feedback questions - Application usage analytics through integration with company's existing mobile analytics infrastructure, capturing user session metrics (duration, frequency, time patterns), feature interaction data, screen views, button interactions, and performance metrics - Data validation implementing record count comparison with source system logs, removal of test accounts and internal transactions, duplicate record elimination, statistical outlier analysis, and business rule verification - Anonymization procedures following Colombian data protection regulations (Ley 1581 de 2012) including removal of personal identifiers, use of non-reversible unique codes, geographic aggregation to city level, and privacy-preserving data transformations - Data integration consolidating multiple sources using customer identifiers with referential integrity validation, format standardization across all data types and structures, and systematic alignment of temporal data from different collection methods - Calculation of derived metrics including customer lifetime value, behavioral indicators, satisfaction score aggregation from survey responses, transaction frequency patterns, and churn probability predictions using 30-day forecast window

Institutions

  • Universidad Tecnologica de Pereira

Categories

Social Sciences, Computer Science, Mathematics, Banking, Accounting, Business, Management, Statistics, Economics, Finance, Econometrics, Information System, Marketing, Data Science, Consumer Behavior, Decision Science

Licence