Machine Learning Topic on Intrusion Detection
Description
Intrusion Detection Systems (IDSs) and Intrusion Prevention Systems (IPSs) are crucial tools for defending against sophisticated and constantly evolving network threats. Anomaly-based intrusion detection techniques are seeing consistent and accurate performance evolutions due to the absence of dependable test and validation datasets. Based on our assessments of the eleven datasets that have been available since 1998, it is evident that the majority of them are outdated and not trustworthy. many datasets in question exhibit limited traffic diversity and volumes, while others fail to encompass the full range of known assaults. Additionally, many datasets anonymize packet payload data, which hinders their ability to accurately reflect current trends. Additionally, some may have a deficiency in terms of their feature set and metadata. The CICIDS2017 dataset comprises both benign and the latest prevalent threats, accurately representing real-world data (PCAPs). The report also incorporates the outcomes of the network traffic analysis conducted by CICFlowMeter. The flows are categorized and labeled based on several attributes such as the time stamp, source and destination IPs, source and destination ports, protocols, and attack types. These results are stored in CSV files. Additionally, the definition of the extracted characteristics is also provided. Our main focus in constructing this dataset was to provide background traffic that closely resembles real-life scenarios. We utilized our suggested B-Profile system (Sharafaldin, et al. 2016) to analyze and characterize the abstract behavior of human interactions. This system also generates benign background traffic that mimics naturalistic patterns. We constructed the abstract behavior of 25 users for this dataset by analyzing their usage of the HTTP, HTTPS, FTP, SSH, and email protocols. The data collection session commenced at 9 a.m. on Monday, July 3, 2017, and concluded at 5 p.m. on Friday, July 7, 2017, spanning a duration of 5 days. Monday is the typical day when just mild traffic is present. The implemented attacks comprise Brute Force FTP, Brute Force SSH, Denial of Service (DoS), Heartbleed, Web Attack, Infiltration, Botnet, and Distributed Denial of Service (DDoS). They have been executed in both the morning and afternoon on Tuesday, Wednesday, Thursday, and Friday. In our latest approach for evaluating datasets (Gharib et al., 2016), we have established eleven essential criteria for constructing a dependable benchmark dataset. None of the prior Intrusion Detection System (IDS) datasets were able to encompass all 11 requirements.