BigFlow-NIDS
Description
BigFlow-NIDS, a large-scale, NetFlow-based dataset and accompanying analysis pipeline designed for intrusion-detection research in big-data environments. BigFlow-NIDS was created by merging four major benchmark NetFlow datasets (NF-UNSW-NB15-v3, NF-ToN-IoT-v3, NF-BoT-IoT-v3, NF-CSE-CIC-IDS2018-v3) using an Apache Spark preprocessing pipeline that performs deduplication, missing-value handling, label encoding, and feature harmonization. The final release contains 66,935,021 flows, 55 flow attributes, and 32 fine-grained attack categories. To enable scalable machine learning, batch analytics, and streaming workloads, the dataset is distributed in CSV format and in Parquet format across 67 partitioned files, supporting efficient parallel I/O and high-throughput analytics.