Benign and malicious domains based on DNS logs

Published: 31-03-2021| Version 3 | DOI: 10.17632/623sshkdrz.3
Claudio Marques


The dataset is meant for supervised machine learning based analysis of malicious and non-malicious domain names. The dataset was created from scratch, using publicly DNS logs of both malicious and non-malicious domain names. Using the domain name as input, 34 features were obtained. Features like the domain name, entropy, number of strange characters and domain name length were obtained directly from the domain name. Other features like, domains name creation date, IP, open ports, geolocation were obtained from data enrichment processes (e.g. OSINT). This dataset consists of data from 90000 domains names and it is balanced between 50% non-malicious and 50% of malicious domain names.