FW-Flow: A curated multi-class firewall flow dataset with engineered flow-based and statistical features

Published: 4 March 2026| Version 1 | DOI: 10.17632/88p4xh8td9.1
Contributors:
Linh Dam Minh,
,

Description

The dataset is derived from the publicly available Internet Firewall Data dataset provided by the UCI Machine Learning Repository (DOI: 10.24432/C5131M). The original data were collected from internet traffic records on a university firewall and comprise 65,532 multi-class labeled flow records. - FW-Flow provides a curated and feature-engineered version of the data, including the original 12 raw firewall attributes (ports, NAT mappings, bytes, packets, elapsed time) together with 30 additional flow-based, behavioral, and statistical derived features, resulting in 42 structured attributes. - The dataset supports multi-class classification (allow, deny, drop, reset-both), anomaly detection, and traffic behavior modeling research. - Approximately 43% of flows exhibit zero-second duration due to logging granularity in the original records. A small epsilon value (1e-6) was applied during rate-based feature computation to ensure numerical stability. All preprocessing and feature engineering procedures are fully reproducible via the accompanying scripts. [1] Internet Firewall Data [Dataset], UCI Machine Learning Repository, 2019. DOI: 10.24432/C5131M.

Files

Steps to reproduce

1. Download the original Internet Firewall Data dataset from the UCI Machine Learning Repository (DOI: 10.24432/C5131M) or use the provided fw_flow_raw.csv file. 2. Execute the provided Python/Colab preprocessing script. 3. Compute derived flow-based, behavioral, and statistical features (30 additional attributes). 4. Apply epsilon (1e-6) correction for rate-based features to ensure numerical stability. 5. The final structured dataset (fw_flow_engineered.csv) containing 42 attributes will be generated. 6. Use the action_encoded column for multi-class classification experiments.

Categories

Computer Science, Network Security, Machine Learning, Intrusion Detection

Licence