Structured Dataset of Daily Electricity Demand, Generation, Load Shedding, and Supply Constraints in Bangladesh (2019–2024).

Published: 19 June 2025| Version 1 | DOI: 10.17632/x7r7wdb39k.1
Contributors:
,
,

Description

This dataset presents a structured, multi-version compilation of daily electricity system records for Bangladesh, spanning the period from November 21, 2019, to December 30, 2024. It was developed by programmatically extracting 1,867 daily PDF reports from the publicly accessible archive of the Bangladesh Power Development Board (BPDB): https://misc.bpdb.gov.bd/daily-generation-archive. The dataset is organized into five progressive versions, each contained in a separate folder. These versions reflect successive enhancements—ranging from initial raw extraction to final preprocessing suitable for machine learning workflows. The dataset supports granular investigation at both national and divisional levels. Version 1 comprises raw records parsed directly from the BPDB reports. It contains unprocessed inconsistencies, missing entries, and formatting noise. This version is preserved to support traceability. Version 2 offers a cleaned and verified dataset where duplicate entries were removed, and missing values were recovered using source files. Daily national and divisional records were reconciled and validated. Version 3 adds temporal and calendar-based features. National and religious holidays were annotated manually and categorized by type. These features are intended to help capture behavioral variations in electricity consumption related to festive or reduced-activity periods. Version 4 applies robust data curation techniques, including forward and backward interpolation for missing values, logical imputation for invalid demand records, and column-wise consistency checks. Two temporal attributes—year and month—were also added to facilitate seasonal analysis. Version 5 is optimized for time series modeling. It incorporates smoothed corrections for outlier values using centered rolling medians and scales key numerical features for modeling readiness. The tabular structure remains unchanged from earlier versions, ensuring continuity for comparative analysis. Each version is supplied as a single .xlsx file with flat headers. A README.txt file explains the processing logic and provides a full breakdown of the steps followed in dataset construction. Although the source code is not included, the process is thoroughly documented to enable reproducibility. A separate column_descriptions.xlsx file describes all variables in the final version in detail. The dataset exhibits multiple seasonal and temporal trends that reflect operational rhythms of Bangladesh’s energy system. Distinct patterns emerge across weekdays, months, and holidays, and regional variation is evident across administrative divisions. These characteristics make the dataset well-suited for predictive modeling, policy analysis, and infrastructure planning under varying demand scenarios.

Files

Steps to reproduce

The dataset was constructed through a multi-stage workflow involving automated data retrieval, structured parsing, validation, and feature augmentation. All steps are documented in the accompanying README.txt, which outlines the process for independently reproducing the dataset using publicly available tools. 1. Data Access Daily reports were downloaded from the publicly accessible BPDB archive: https://misc.bpdb.gov.bd/daily-generation-archive. Each report is a PDF that includes structured electricity statistics, typically on the second page. 2. PDF Retrieval A custom Python script paginated through the archive, detected valid PDF links using regular expressions, and downloaded files with preserved filenames. This ensured full coverage from November 21, 2019, to December 30, 2024. 3. Text Extraction Using pdfplumber, each PDF was processed to extract text from the table page while preserving layout. Files lacking expected structures were logged for manual review and possible recovery. 4. Text Parsing Extracted content was parsed with regular expressions to identify values corresponding to daily demand, generation, regional breakdowns, and constraint indicators. Parsed values were stored in structured dictionaries and compiled into a pandas DataFrame. 5. Data Consolidation Parsed records were appended into a cumulative table. When partial records or unreadable files were encountered, they were noted in a separate error log. Consolidated entries were exported as a .xlsx file. 6. Cleaning and Completion Duplicate entries were removed. Missing days were identified by comparing against the full expected date range and manually appended from source files when available. Column formats were standardized across the dataset. 7. Feature Augmentation Holiday names and types were manually classified using official calendars. Temporal markers such as weekday, month, and year were derived from the date field. These features allow for calendar-aware modeling. 8. Imputation and Smoothing Missing values were filled using forward/backward interpolation or conditional averaging. Outliers were identified using a centered 5-day rolling median and replaced if they deviated beyond a defined threshold. 9. Normalization and Finalization Key numeric fields were normalized to ensure compatibility with time series forecasting models. Finalized datasets were stored in versioned folders (V1–V5), with each version representing a specific state of processing. Each version is saved as a single .xlsx file with flat headers. While the original scripts are not provided, this structured workflow allows users to reproduce or adapt the process using similar tools and logic.

Institutions

  • Islamic University of Technology

Categories

Electric Power, Time Series Prediction, Energy Demand, Energy Consumption, Bangladesh, Developing Countries, Deep Learning, Regional Analysis

Licence