Theft detection in smart grid environment

Published: 6 June 2022| Version 3 | DOI: 10.17632/c3c7329tjj.3
salah zidi, Alaeddine Mihoub, Saeed Mian Qaisar, Moez Krichen, Qasem Abu Al-Haija


The dataset contains energy consumption for 16 different types of consumers. The original data includes several energy consumption measurements for several customers for one year (12 months). Measurements are taken every hour. Six different types of frauds are added to the original dataset. They consist of different types of thefts that some consumers can cause. The first type of theft consists of a considerable reduction of electricity consumption during the day. This reduction is calculated by multiplying the consumption by the randomly chosen value between 0.1 and 0.8. In the second type of theft, electricity consumption drops to zero at random and during an arbitrary period. The third type of theft is similar to the first type, but each consumption value (each hour) is multiplied by a random number. A random fraction of the mean consumption is generated for the fourth type of theft. The fifth type reports the mean consumption, and the last type of theft (i.e., the sixth type) reverses the order of readings. We developed a theft generator that enabled us to generate these six types of theft as described previously randomly. The original data is collected from the Open Energy Data Initiative (OEDI) platform. It is a centralized repository of high-value energy research datasets aggregated from the U.S. Department of Energy’s Programs, Offices, and National Laboratories. To use this dataset please cite this article: Salah Zidi, Alaeddine Mihoub, Saeed Mian Qaisar, Moez Krichen, Qasem Abu Al-Haija, Theft detection dataset for benchmarking and machine learning based classification in a smart grid environment, Journal of King Saud University - Computer and Information Sciences, 2022, ISSN 1319-1578, (



Machine Learning, Energy Consumption, Theft, Smart Grid