CICIDS2017-DOS (MB)
Description
CICIDS2017-DOS (MB) represents the moderately balanced version of the DoS-specific subset derived from the CICIDS2017 dataset. This version maintains all six traffic categories found in the PB dataset, including benign traffic and the five DoS attack types: DDoS, DoS GoldenEye, DoS Hulk, DoS Slowloris, and DoS SlowHTTPTest. Unlike the perfectly balanced distribution in the PB dataset, the MB dataset follows the default behaviour of the TUNE sampling framework, where majority classes are reduced to approximately twice the median class size, while minority classes are moderately increased only when needed. This produces a structured but naturally uneven distribution where attack categories remain present in meaningful proportions without full equalization. All preprocessing steps used in constructing the PB dataset were also applied here, including feature reduction, cleaning, and converting categorical attributes to numeric format. The resulting dataset provides a controlled yet realistic level of imbalance suitable for evaluating model robustness to non-uniform class distributions and for exploring sampling-aware training approaches in intrusion detection research. Cite Panigrahi, R., & Borah, S. (2018). A detailed analysis of CICIDS2017 dataset for designing Intrusion Detection Systems. International Journal of Engineering & Technology, 7(3.24), 479-482. Iman Sharafaldin, Arash Habibi Lashkari, and Ali A. Ghorbani, “Toward Generating a New Intrusion Detection Dataset and Intrusion Traffic Characterization”, 4th International Conference on Information Systems Security and Privacy (ICISSP), Portugal, January 2018.
Files
Steps to reproduce
Input: CICIDS2017 raw dataset files Output: CICIDS2017-DOS (MB) moderately balanced dataset Step 1: Merge all day-wise CICIDS2017 files into one dataset. Step 2: Remove records with missing labels, NULL values, or infinite values. Step 3: Filter the dataset to retain only the following classes: {Benign, DDoS, DoS Hulk, DoS GoldenEye, DoS Slowloris, DoS SlowHTTPTest} Step 4: Remove non-informative and redundant attributes. Step 5: Convert categorical fields: a) Convert IP addresses to integer format. b) Convert timestamp to milliseconds since epoch. Step 6: Apply TUNE Sampling Framework: 6.1 Compute sample count for each class. 6.2 Calculate median class size. 6.3 Perform aggressive undersampling if size > 2 × median. 6.4 Perform moderate oversampling if size < 0.5 × median. 6.5 Keep classes unchanged if close to median. Step 7 Use DEFAULT TUNE target size: Majority class is set to approximately 2 × median size, while minority classes remain at TUNE-adjusted values. (No forced equal distribution.) Step 8: Shuffle dataset to remove ordering bias. Step 9: Save output as CICIDS2017-DOS (MB).
Institutions
- Amrita Vishwa Vidyapeetham