Ransomware Combined Structural Feature Dataset
Description
This dataset contains several strutuctural features extracted of 2675 binary executable samples. The training and validation set consisted of 2157 samples (80%): 1023 ransomware belonging to 25 relevant families and 1134 goodware. Meanwhile, a testing set consisted of 518 samples (20%): 385 ransomware belonging to the 15 recent families and 133 goodware. The CSV file columns are sample ID, filename, target class (GR), family ID, and numerical columns (features), as follows: | ID | filename | GR | family | Features | Training Goodware | 10000 to 11133 | Their name.exe | 0 | 0 | Numerical features | Testing Goodware | 12000 to 12132 | Their name.exe | 0 | 0 | Numerical features | Training Ransomware | 20000 to 21022 | Their SHA-256 hash | 1 | 1-25 family IDs | Numerical features | Testing Ransomware | 22000 to 22384 | Their SHA-256 hash | 1 | 26-40 family IDs | Numerical features | Options: 1) The dataset is split into individual types of features without preprocessing, including headers, imported DLLs, function calls, entropy of sections, and 3, 4, and 5-grams opcode frequencies. 2) The combined datasets include headers, imported DLLs, function calls, and entropy of sections feature sets, with and without the 3-gram feature set, after the preprocessing step, according to our research paper entitled "A Comprehensive Analysis Combining Structural Features for Detection of New Ransomware Families," from the Journal of Information Security and Applications. Note: The preprocessing step primarily involved merging similar APIs, feature selection, and normalizing the features based on their maximum and minimum values, considering only the training data. Some features exclusively exist in the test data, with zero occurrences in the training samples. For accurate testing, it's advisable to exclude these features from the training set. Family IDs: Avaddon 1 Babuk 2 Blackmatter 3 Conti 4 Darkside 5 Dharma 6 Doppelpaymer 7 Exorcist 8 Gandcrab 9 Lockbit 10 Makop 11 Maze 12 Mountlocker 13 Nefilim 14 Netwalker 15 Phobos 16 Pysa 17 Ragnarok 18 RansomeXX 19 Revil 20 Ryuk 21 Stop 22 Thanos 23 Wastedlocker 24 Zeppelin 25 AvosLocker 26 BianLian 27 BlackBasta 28 BlackByte 29 BlackCat 30 BlueSky 31 Clop 32 Hive 33 HolyGhost 34 Karma 35 Lorenz 36 Maui 37 Night Sky 38 PlayCrypt 39 Quantum 40
Files
Steps to reproduce
Training Data: Recent reports from seven global cybersecurity vendors list a vast diversity of ransomware families active and dangerous: We used two criteria for selecting relevant ransomware families for the dataset: 1) first observation occurred from 2020 or 2) appeared in at least two of the seven reports. We categorized ransomware into families using three criteria based on vendor engine detection from VirusTotal (https://www.virustotal.com): 1) at least 45 flagged as malicious, 2) at least 15 flagged as ransomware, and 3) at least ten nominations for the same family. Appendix A.1: Vendor; report title; release date. 1) Cyber Security Networks; Ransomware: Through the Lens of Threat and Vulnerability Management; Fev-2022. 2) EmsiSoft; Ransomware statictis for 2021: Year in summary; Jan-2022. 3) Sophos; Sophos 2022 Threat Report: Interrelated threats target an interdependent world; Nov-2021. 4) McAfee; Advanced Threat Research Report; Oct-2021. 5) VirusTotal; Ransomware in a Global Context; Oct-2021. 6) Palo Alto Networks; Unit 42 - Ransomware Threat Report 2021; Apr-2021. 7) Group-IB; Ransomware Uncovered 2020-2021; Mar-2021. Test Data: Five more recent reports based the selection of the testing ransomware families. We downloaded any reported family that did not appear in the previous reports. Appendix A.2: Vendor; report title; release date. 1) Sophos; Sophos 2023 Threat Report: Maturing criminal marketplaces present new challenges to defenders; Nov-2022 2) Cyber Security Networks; Ransomware: Through the Lens of Threat and Vulnerability Management Index Update Q2-Q3 2022; Aug-2022 3) Kroll; Q2 2022 Threat Landscape: Ransomware Returns, Healthcare Hit; Aug-2022 4) Zscaler; 2022 ThreatLabz State of Ransomware Report; Jun-2022 5) Cyber Security Networks; Ransomware: Through the Lens of Threat and Vulnerability Management Index Update Q1 2022; May-2022 All ransomware samples were downloaded from the VirusShare (https://virusshare.com) and Hybrid-Analysis (https://www.hybrid-analysis.com) databases. These files are executable binaries from the Windows OS with minimum and maximum sizes of 14.8 kB and 12.2 MB, respectively. Password .zip file: DANGER . We hold no liability for misuse. To diversify non-malicious executable files, we collected samples from the PortableApps (https://portableapps.com) and Softonic (https://en.softonic.com) databases and Windows 10 standard system, as well as by scouring the web for no-install software. All of these were scanned using ESET NOD32 Antivirus (https://www.eset.com) version 16.0.26.0 to ensure their safety. A complete description of this dataset can be found in the paper "Moreira, C.C., Moreira, D.C., Sales Jr., C. de S. de: A Comprehensive Analysis Combining Structural Features for Detection of New Ransomware Families. Journal of Information Security and Applications, v. 81, art. 103716 (2024), DOI: https://doi.org/10.1016/j.jisa.2024.103716". Please, reference our work when using this dataset.