Dataset of intrusion detection alerts from a sharing platform
The dataset consists of the main file with the intrusion detection alerts and four auxiliary files with enriched data. The alerts were collected from the SABU alert sharing platform for one week and are stored in the IDEA format. Almost 12 million alerts were collected from 34 intrusion detection systems, honeypots, and other data sources deployed in 3 distinct organizations. The IP addresses, hostname, URLs, and other identifiers in the alerts are anonymized, but the information in the auxiliary files allow for the profiling of malicious actors. The auxiliary files contain information on over 1.7 million IP addresses contained in the alerts, the most frequent identifiers of attackers and victims of observed events. Reputation scores, geolocation, and data from PassiveDNS system are provided. The reputation scores include information on the presence of the IP addresses on publicly available blacklists or results of scans by Internet-wide scanners. The geolocation provides the approximate geographical locations of the IP addresses; a data layer for a common geographical information system is provided. The PassiveDNS data are in the form of a feature vector of domain names the IP addresses were translated to in the time of their involvement in malicious activities. The list of files goes as follow: dataset.idea.zip - compressed dataset.idea file with the alerts in IDEA format, one alert per line, Aux_1A_Geolocation-csv - CSV file with geolocation information, Aux_1B_GIS_data.zip - compressed archive of spatial data for use with a geographical information system ArcGIS, Aux_2_Passive_DNS - CSV file with characteristics of DNS records for the IP addresses in the data obtained via PassiveDNS system, Aux_3_Enrichment - compressed archive of various other enrichments of IP addresses, splitted per days, see README in the archive.