DGTA-BENCH - Domain Generation and Tunneling Algorithms for Benchmark

Published: 6 December 2021| Version 1 | DOI: 10.17632/2wzf9bz7xr.1
Yakov Bubnov


Intrusion detection methods in computer networks still remain an important problem in computer science. To enforce the network security special instrument are embedded in operating systems and ingenuity software are elaborated. Such applications intend to prevent cyberattacks against the local networks and personal computers, theft of data, undesired spam-advertised products, dangerous drive-by hidden objects infected a visitor’s machine with malware. As a result, the growing interest is observed in developing systems to protect the end user from the potential attack. The provided dataset is derived from multiple sources in order to provide a read-to-use benchmark for Machine Learning researchers and Cyber Security analysts. This dataset is motivated by the need to facilitate publications on the topic of intrusion detection through Domain Name System (DNS) to use a comparative well-defined framework. It is a collection of Domain Generation Algorithms (DGA), Domain Tunneling Algorithms (DTA), and legitimate DNS name used to access popular Internet resources, safe domains hosted by DNS providers as well as domains of Content Delivery Network (CDN). It contains 1.65M labeled domain names divided into 55 classes, 4 of which are DTA, 50 of which are DGA and 1 legitimate class. The dataset is represented by a single parquet file, which can be accessed using a Python framework: Pandas, PyArrow or TensorFlow/IO.


Steps to reproduce

Data sources are linked to the publication below.


Belorusskij gosudarstvennyi universitet informatiki i radioelektroniki


Cybersecurity, Natural Language Processing, Machine Learning, Networking