Botnet Group Activity Dataset

Published: 11 June 2021| Version 1 | DOI: 10.17632/4vftxh97m8.1
Dandy Pramana Hostiadi,


The dataset comprises bot group activities, consisting of 13 scenarios with bot actors and activity patterns in different bots. Each dataset scenario has 14 features. These are start time, duration, protocol, source IP address, source port address, direction of transaction, destination IP address, destination port address, state of transaction, source TOS byte value, destination TOS byte value, total transaction packet count, total transaction bytes, source packet transaction and one feature as activity label. The bot group activity dataset is presented in the binetflow file form. In the scenario dataset directory, there are several files, namely dataset_result.binetflow for the datasets containing bot group activities and normal activities with activity labels, dataset_result.botnet_only.binetflow for datasets containing only bot group activities, dataset_result.normal_only.binetflow for datasets containing only normal host activities, dataset_result.without_label.binetflow for the dataset containing bot group activities and normal activities without activity labels. The dataset describes each dataset scenario for data analysis, such as activity time, activity duration, number of normal host activities, number of bot activities, number of the bot actors, and total activity records. In addition, graphs of normal and bot activities are also presented in time analysis to see the intensity and periodic behavior. In particular, this dataset is provided as a dataset for bot group activities by showing a series of interrelated activities. Therefore, this dataset can be used as research data for periodic and intense bot group activity detection and as a knowledge database with various bot activity scenarios. The dataset has resulted from the adoption of botnet dataset behavior on the CTU Dataset [1]. The difference is that this presented dataset has the intensity and periodic bot activity for each dataset scenario. Thus, this dataset is suitable for research in detecting bot group activity with a time-based segmentation approach [2]–[4]. Besides, this dataset has a more stable amount of activity in the regular (1 hour) compared to the CTU dataset.


Steps to reproduce

This dataset is generated by modeling bot activity patterns that adopt botnet behavior in the CTU Dataset [1]. This bot activity model [4] produces detection of both normal and bot activities, which have correlated activities as a group. The detection results are then stored in two knowledge bases, which are normal and bot. Then, the parameter data are specified to obtain the dataset. This parameter includes (i) total time duration, which is the time of the dataset activity in hours; (ii) the number of bots, which is the number of bots in the dataset represented by the number of IP bot; (iii) type of boat flow, which is the correlated activity types according to the scenario activity data in the bot activity knowledge base; (iv) the number of normal activities, which is the number of normal flow activities taken from the normal activity knowledge base. The dataset parameters in each dataset scenario are different, depending on the adopted CTU dataset scenario. This botnet group activity dataset is generated in the form of binetflow.