Ransomware Printable Character N-gram Feature Dataset
Description
This dataset was generated for the academic research paper titled "Zero-Day Ransomware Family Detection Based on Printable Character Analysis and Machine Learning", published in the Electronic Journal of Scientific Initiation in Computing (Revista Eletrônica de Iniciação Científica em Computação – REIC), vol. 23 (2025), doi: http://doi.org/10.5753/reic.2025.6021. It contains structural features in the form of 3-, 4-, and 5-gram printable characters extracted from 2,675 binary executable samples. The training and validation set consists of 2,157 samples (80%): 1,023 ransomware samples from 25 relevant families and 1,134 goodware samples. The testing set consists of 518 samples (20%): 385 ransomware samples from 15 recent families and 133 goodware samples. The CSV file columns are sample ID, filename, target class (RG), family ID, and numerical columns ( binaryfeatures), as follows: | ID | filename | RG | family | 2000 Features | Training Goodware | 10000 to 11133 | Their name.exe | 0 | 0 | Binary features | Testing Goodware | 12000 to 12132 | Their name.exe | 0 | 0 | Binary features | Training Ransomware | 20000 to 21022 | Their SHA-256 hash | 1 | 1-25 family IDs | Binary features | Testing Ransomware | 22000 to 22384 | Their SHA-256 hash | 1 | 26-40 family IDs | Binary features | Family IDs: Avaddon 1 Babuk 2 Blackmatter 3 Conti 4 Darkside 5 Dharma 6 Doppelpaymer 7 Exorcist 8 Gandcrab 9 Lockbit 10 Makop 11 Maze 12 Mountlocker 13 Nefilim 14 Netwalker 15 Phobos 16 Pysa 17 Ragnarok 18 RansomeXX 19 Revil 20 Ryuk 21 Stop 22 Thanos 23 Wastedlocker 24 Zeppelin 25 AvosLocker 26 BianLian 27 BlackBasta 28 BlackByte 29 BlackCat 30 BlueSky 31 Clop 32 Hive 33 HolyGhost 34 Karma 35 Lorenz 36 Maui 37 Night Sky 38 PlayCrypt 39 Quantum 40
Files
Steps to reproduce
To select the sample collection, please refer to the Steps to Reproduce section of the dataset from our previous work: https://data.mendeley.com/datasets/yzhcvn7sj5/ A complete description of this dataset can be found in the paper "Gonçalves, K.; Silva, F.; Moreira, D.; Moreira, C.: Zero-Day Ransomware Family Detection Based on Printable Character Analysis and Machine Learning. Revista Eletrônica de Iniciação Científica em Computação, vol. 23 (2025), doi: http://doi.org/10.5753/reic.2025.6021" Please, reference our work when using this dataset.