electricity consumption invoice data

Published: 13 September 2021| Version 1 | DOI: 10.17632/nwwvh8nt63.1
Contributor:
WAN NUR ATIRAH WAN MOHD ADNAN

Description

This is pre-processed data of fraud detection in electricity and gas consumption obtained from Kaggle, an open-source website for data. There are two datasets. The first dataset is the pre-processed data where the duplicates and missing values have already been removed. The data had also been filtered to consist of only rows of data for electricity consumption, client category of 11, the counter coefficient is one and lastly, invoices dated in 2019 are included for model training.

Files

Steps to reproduce

1) The client and invoice data are merged using the one-to-many option. 2) Duplicates and missing values removed. 3) Select counter_type = 'ELEC', client_catg = 11, counter_coefficient = 1 and invoice_date of 2019 only. 4) The numbering of the categorical variable is reassigned with a new numbering except for counter_statue and target. 5) Standardization applied to the continuous variables. 6) Apply random undersampling with a factor of 0.06 for non-fraud and 1 for fraud. (2nd dataset only)