Ransomware PE Header Feature Dataset
This dataset contains headers of 2157 binary executable samples comprising 1134 legitimate software (goodware) and 1023 ransomware, grouped into 25 ransomware families. The dataset was retrieved by extracting raw information of the PE header (first 1024 bytes). The CSV file columns are sample ID, filename, target class (GR), family ID, and numerical columns from 0 to 1023, as follows: | ID | filename | GR | family | 0 - 1023 | Goodware | 10000 to 11133 | Their name.exe | 0 | 0 | Numerical features ranging from 0 to 255 | Ransomware | 20000 to 21022 | Their SHA-256 hash | 1 | 25 family IDs | Numerical features ranging from 0 to 255 | Family IDs: Avaddon 1 Babuk 2 Blackmatter 3 Conti 4 Darkside 5 Dharma 6 Doppelpaymer 7 Exorcist 8 Gandcrab 9 Lockbit 10 Makop 11 Maze 12 Mountlocker 13 Nefilim 14 Netwalker 15 Phobos 16 Pysa 17 Ragnarok 18 RansomeXX 19 Revil 20 Ryuk 21 Stop 22 Thanos 23 Wastedlocker 24 Zeppelin 25 A complete description of this dataset can be found in the paper "Moreira, C.C., Moreira, D.C., Sales Jr., C. de S. de: Improving Ransomware Detection based on Portable Executable Header using Xception Convolutional Neural Network. Computers & Security, 103265 (2023), DOI: https://doi.org/10.1016/j.cose.2023.103265, URL: https://www.sciencedirect.com/science/article/pii/S016740482300175X " Please, reference our work when using this dataset.
Steps to reproduce
Recent reports from seven global cybersecurity vendors list a vast diversity of ransomware families still active and dangerous: Vendor; report title; URL; release date. 1) Cyber Security Networks; Ransomware: Through the Lens of Threat and Vulnerability Management; https://cybersecurityworks.com/howdymanage/uploads/file/ransomware-_-2022-spotlight-report_compressed.pdf; Fev-2022. 2) EmsiSoft; Ransomware statictis for 2021: Year in summary; https://blog.emsisoft.com/en/40833/ransomware-statistics-for-2021-year-in-summary/; Jan-2022. 3) Sophos; Sophos 2022 Threat Report: Interrelated threats target an interdependent world; https://assets.sophos.com/X24WTUEQ/at/b739xqx5jg5w9w7p2bpzxg/sophos-2022-threat-report.pdf; Nov-2021. 4) McAfee; Advanced Threat Research Report; https://www.mcafee.com/enterprise/en-us/assets/reports/rp-threats-oct-2021.pdf; Oct-2021. 5) VirusTotal; Ransomware in a Global Context; https://storage.googleapis.com/vtpublic/vt-ransomware-report-2021.pdf; Oct-2021. 6) Palo Alto Networks; 2021 Unit 42 Ransomware Threat Report: Understand trends and tactics to bolster defenses; https://www.paloaltonetworks.com/resources/research/unit42-ransomware-threat-report-2021; Apr-2021. 7) Group-IB; Ransomware Uncovered 2020-2021; https://explore.group-ib.com/ransomware-reports/ransomware_uncovered_2020; Mar-2021. We used two criteria for selecting the most updated and relevant ransomware families for the dataset: 1) first observation occurred from 2020 or 2) appeared in at least two of the seven reports. We categorized ransomware into families using three criteria based on vendor engine detection from VirusTotal (https://www.virustotal.com): 1) at least 45 flagged as malicious, 2) at least 15 flagged as ransomware, and 3) at least ten nominations for the same family. All ransomware samples were downloaded from the VirusShare (https://virusshare.com) and Hybrid-Analysis (https://www.hybrid-analysis.com) databases. These files are executable binaries from the Windows OS with minimum and maximum sizes of 14.8 kB and 12.2 MB, respectively. To diversify non-malicious executable files, we collected samples from the PortableApps (https://portableapps.com) and Softonic (https://en.softonic.com) databases and Windows 10 standard system, as well as by scouring the web for no-install software. All of these were scanned using ESET NOD32 Antivirus (https://www.eset.com) version 18.104.22.168 to ensure their safety. A complete description of this dataset can be found in the paper "Moreira, C.C., Moreira, D.C., Sales Jr., C. de S. de: Improving Ransomware Detection based on Portable Executable Header using Xception Convolutional Neural Network. Computers & Security, 103265 (2023), DOI: https://doi.org/10.1016/j.cose.2023.103265, URL: https://www.sciencedirect.com/science/article/pii/S016740482300175X " Please, reference our work when using this dataset.