anonymized loan applications USA_ETL

Published: 25 June 2025| Version 1 | DOI: 10.17632/tx2v248cx4.1
Contributor:
Sayyed Khawar Abbas

Description

Bank Customer Dataset for Personal Loan Prediction This dataset contains demographic, financial, and behavioral data of 5,000 bank customers collected during a marketing campaign aimed at offering personal loans. The primary objective is to predict whether a customer accepted the personal loan offer (personal_loan), making this a supervised binary classification problem. The dataset includes 14 features such as age, income, credit card usage, education level, mortgage value, and account ownership information. It can be used for machine learning tasks such as classification modeling, feature selection, customer segmentation, and marketing analytics.

Files

Steps to reproduce

──────────────────────────────────────── BANK_PERSONAL_LOAN_CLEANING WORKFLOW ──────────────────────────────────────── Source file : Bank_Personal_Loan_Modelling(1).csv Target file : Bank_Personal_Loan_Cleaned.xlsx (✓ ready for Mendeley) STEP 0 – Open & quick scan Import the CSV into Excel → new sheet “Raw”. Filter row 1 (headers) and press Ctrl + Shift + End to confirm 5 000 rows × 14 columns. STEP 1 – Remove redundant ID column Column A “ID” is a simple row index; right-click → Delete Column. Rename the sheet “Stage1_NoID”. STEP 2 – check for blanks / impossible values a) Press Ctrl + G → Special → Blanks to highlight empties. Result: none found (dataset is already complete). b) Sort each numeric column high-to-low to spot negatives where none should exist: Age, Experience, Income, CCAvg, Mortgage ≥ 0 ✔ ZIP Code is 3-digit; all between 901 and 966 ✔ STEP 3 – data-type fixes Select ZIP Code column → Format Cells → Text (prevents leading-zero loss). Confirm Education, Family, CD Account, Securities Account, Online, CreditCard are integers (General format is fine). STEP 4 – create modelling-friendly fields Add a new sheet “Stage2_Features” and copy all data in. Then: 4.1 Standard‐scale continuous variables (better model convergence) In columns to the right insert: =STANDARDIZE([@Age], AVERAGE(Age), STDEV(Age)) Repeat for Experience, Income, CCAvg, Mortgage. Name the columns z_Age, z_Experience … z_Mortgage. 4.2 One-hot encode Education if preferred Insert three new columns: Edu_Undergrad, Edu_Graduate, Edu_Advanced. Use =--([@Education]=1) etc. to flag 0/1. 4.3 Derive “AgeBand” (categorical bias slice) =IF(Age<40,"<40",IF(Age<60,"40-59","60+")) STEP 5 – rename headers for clarity Personal Loan → Loan_Accepted CCavg → Monthly_CC_Spend Securities Account → Securities_Acct CD Account → CD_Acct (Keep names < 20 characters for tidy plots.) STEP 6 – column order (human-friendly) 1 ZIP_Code 2 Age 3 AgeBand 4 Experience 5 Income 6 Family 7 Education 8 Monthly_CC_Spend 9 Mortgage 10 Securities_Acct 11 CD_Acct 12 Online 13 CreditCard 14 Loan_Accepted 15–19 all z_* columns (optional) 20–22 Edu_* one-hots (optional) STEP 7 – freeze & filter View → Freeze Top Row. Turn on Table format (Ctrl + T) for auto-filter dropdowns. STEP 8 – export to cleaned file File → Save As → Bank_Personal_Loan_Cleaned.xlsx ✓ Include only the “Stage2_Features” sheet. ✓ Metadata: author, date, licence (CC-BY-4.0), study DOI placeholder. STEP 9 – document the log In a second sheet “README” paste these steps verbatim. Add a citation line: “Clean-sheet used in Julius AI workflow, 2025-06-22.” ──────────────────────────────────────── Deliverables for the Mendeley repository ──────────────────────────────────────── 1 Bank_Personal_Loan_Cleaned.xlsx (final data) 2 README.txt / README sheet (workflow above) 3 scripts_or_notebook.zip (optional Python *.ipynb, plots, flow-chart)

Institutions

Budapesti Corvinus Egyetem

Categories

Banking, Personal Finance

Licence