CICIDS2017-DOS (IB)

Published: 19 November 2025| Version 1 | DOI: 10.17632/5zhgs9p39v.1
Contributors:
Ranjit Panigrahi, Shreyashi Jana, Moumita Pramanik, Paolo Barsocchi, Akash Kumar Bhoi, Biswajit Brahma

Description

CICIDS2017-DOS (IB) is an imbalanced intrusion detection dataset derived from the CICIDS2017 collection and limited to benign traffic and five Denial-of-Service attack categories: DDoS, DoS GoldenEye, DoS Hulk, DoS Slowloris, and DoS SlowHTTPTest. The distribution remains intentionally skewed, with benign instances representing approximately 80% of the total samples, while each attack class contributes a smaller proportion of the dataset. This imbalance reflects real-world network traffic patterns, where malicious activity occurs sporadically compared to normal traffic. The dataset was generated after consolidating raw files, removing incomplete or invalid entries, eliminating non-informative attributes, and converting textual fields such as IP addresses and timestamps into numeric form. The TUNE sampling process was applied with preserved skew conditions, allowing majority traffic to remain dominant while minority attack classes were retained in limited yet meaningful frequency. The resulting dataset includes 72 processed numerical features and is suitable for evaluating intrusion detection algorithms under realistic imbalance conditions, particularly in studies involving anomaly detection, rare event learning, and imbalanced classification strategies. Cite Panigrahi, R., & Borah, S. (2018). A detailed analysis of CICIDS2017 dataset for designing Intrusion Detection Systems. International Journal of Engineering & Technology, 7(3.24), 479-482. Iman Sharafaldin, Arash Habibi Lashkari, and Ali A. Ghorbani, “Toward Generating a New Intrusion Detection Dataset and Intrusion Traffic Characterization”, 4th International Conference on Information Systems Security and Privacy (ICISSP), Portugal, January 2018.

Files

Steps to reproduce

Input: CICIDS2017 raw dataset files Output: CICIDS2017-DOS (IB) imbalanced dataset Step 1: Merge all day-wise CICIDS2017 files into one dataset. Step 2: Remove records with missing labels, NULL values, or infinite values. Step 3: Filter the dataset to retain only the following classes: {Benign, DDoS, DoS Hulk, DoS GoldenEye, DoS Slowloris, DoS SlowHTTPTest} Step 4: Remove non-informative and redundant attributes. Step 5: Convert categorical fields: a) Convert IP addresses to integer format. b) Convert timestamp to milliseconds since epoch. Step 6: Apply TUNE Sampling Framework: 6.1 Compute sample count for each class. 6.2 Calculate median class size. 6.3 Perform aggressive undersampling if size > 2 × median. 6.4 Perform moderate oversampling if size < 0.5 × median. 6.5 Keep classes unchanged if close to median. Step 7 Ensure BENIGN remains approximately 80% of total samples. Attack classes retain TUNE-adjusted sizes without equalization. Step 8: Shuffle dataset to remove ordering bias. Step 9: Save output as CICIDS2017-DOS (IB).

Institutions

  • Amrita Vishwa Vidyapeetham

Categories

Computer Science, Network Security, Machine Learning, Computer Forensics, Intrusion Detection, Intrusion Analysis

Licence