Daily Rainfall Observations from 460 Stations and Flash Flood Event Records across West Java, Indonesia (2020–2025)
Description
This repository provides a harmonized, station-level daily rainfall dataset and a verified flood event catalog covering the entire province of West Java, Indonesia, from 1 January 2020 to 31 December 2025. Daily rainfall observations were compiled from 460 stations operated by BMKG (synoptic, climatological, and rainfall stations), as well as collaborative networks (BBWS, Pos Hujan Kerjasama). All stations were geolocated using WGS84 coordinates and aligned to the BPS 2023 administrative gazetteer. The flood event catalogue (n = 4,231 events) was assembled from the National Disaster Data and Information (DIBI-BNPB) database and BPBD West Java reports, deduplicated, and cross-validated against media records. A consensus table (n = 3,514 events) links each flood event to the nearest rainfall observation within 25 km and ±1 day, enabling direct event–rainfall analysis. Data have been processed through a four-stage WMO-compliant quality control workflow (format harmonization, range checks, temporal consistency, and spatial consistency). Stations with more than 30% missing daily values are flagged but retained; the published rainfall matrix is left unimputed to preserve raw observational integrity, while the consensus table uses Inverse Distance Weighting (IDW, power = 2) for short gaps (≤2 days). The dataset supports benchmarking of satellite (GSMaP, IMERG) and reanalysis (ERA5, MERRA-2) precipitation products, hydrological modeling, flood-risk assessment, and training of machine-learning forecasting models (e.g., XGBoost, LSTM) over a densely populated tropical region.
Files
Steps to reproduce
This document describes the data collection workflow, instruments, software, and protocols used to compile the dataset. It is intended to enable independent reproduction of the data preparation steps reported in the accompanying Data in Brief manuscript. 1. Data Sources and Acquisition Daily rainfall observations were obtained from the Meteorology, Climatology, and Geophysics Agency of Indonesia (Badan Meteorologi, Klimatologi, dan Geofisika — BMKG) through the official open-data portal https://dataonline.bmkg.go.id/. Records cover 460 rainfall stations distributed across the 27 administrative regencies/cities of West Java Province for the period 1 January 2020 – 31 December 2025 (2,192 days). Flood event records were compiled from the Indonesian National Disaster Management Authority (Badan Nasional Penanggulangan Bencana — BNPB) Disaster Information Data (DIBI) portal https://dibi.bnpb.go.id/, complemented by the West Java Provincial Disaster Management Agency (BPBD Provinsi Jawa Barat) bulletins and verified online news reports for cross-validation. A total of 4,231 flood events were catalogued for the same 2020–2025 period. Administrative geocoding follows the official Indonesian regional code system (Kode Wilayah) issued by Statistics Indonesia (Badan Pusat Statistik — BPS) Regulation No. 2 of 2023, ensuring consistent matching between rainfall stations, flood events, and administrative boundaries. 2. Instruments and Measurement Protocols Rainfall is measured at BMKG stations using standard tipping-bucket and manual (Hellmann-type / Observatorium-type) rain gauges installed and maintained according to World Meteorological Organization (WMO) guidelines (WMO-No. 8, Guide to Instruments and Methods of Observation, 2018 edition). Daily rainfall accumulation is reported in millimetres (mm), measured over the standard 24-hour observation window (07:00–07:00 local time / WIB, UTC+7). Flood events in DIBI/BPBD are documented by field officers at the regency/city level and include event date, location (regency/city), and impact metadata. For this dataset only the date and administrative location were retained to construct a binary daily flood occurrence indicator per regency. 3. Software and Computing Environment • Python 3.11 • pandas 2.2 and NumPy 1.26 • GeoPandas 0.14 and Shapely 2.0 • PyProj 3.6 • Matplotlib 3.8 and Seaborn 0.13 • Microsoft Excel (Office 365) • QGIS 3.34 LTR All scripts were executed on a Linux workstation (Ubuntu 22.04 LTS, Intel i7, 32 GB RAM). The full processing pipeline is deterministic and reproducible from the raw BMKG/BNPB downloads using the package versions listed above.
Institutions
- IPB UniversityWest Java, Bogor