Goodware Dataset used in "Disarming Visualization-Based Approaches in Malware Detection Systems"
Description
This is one of the datasets used in the experiments of the paper: Fascí, L. S., Fisichella, M., Lax, G., & Qian, C. (2023). Disarming visualization-based approaches in malware detection systems. Computers & Security, 126, 103062. https://www.sciencedirect.com/science/article/pii/S0167404822004540 It contains 2.000 goodware files. We collected .exe files both by web scraping on different platforms (DriverPack Solution, Filehippo, Major Geeks, Portable Freeware, Softonic) and by using executables of two virtual machines right after installing the 32-bit version of Windows 8 and 10, respectively. To ensure that the collected .exe files are not malware, we scanned each file with VirusTotal software. The password to open the archive is Ben1gN@D$!? consists of both malware and goodware. The malware samples are from the MalImg dataset (Nataraj, Karthikeyan, Jacob, Manjunath, Nataraj, Karthikeyan, Jacob, Manjunath, 2011). As for goodware, no datasets and no direct reliable sources of safe software were found. Therefore, we collected .exe files both by web scraping on different platforms (DriverPack Solution, Filehippo, Major Geeks, Portable Freeware, Softonic) and by using executables of two virtual machines right after installing the 32-bit version of Windows 8 and 10, respectively. To ensure that the collected .exe files are not malware, we scanned each file with VirusTotal software, as done by Pinhero et al. (2021). In total, we collected about 2000 samples, which are available at Repository (2022).
Files
Categories
Funding
European Commission
871042