Droidware Android Malware Dataset
Description
Droidware is an Android malware dataset developed at the Cybersecurity Lab, GLA University. It contains 265,423 app instances, including 140,722 benign and 124,701 malicious app instances. The current release incorporates applications collected up to 2026, ensuring temporal relevance for contemporary malware analysis. Each application is uniquely identified using its SHA256 hash, enabling reproducibility and independent verification. Each app instance is represented using 68 features extracted from function call graphs, permission metadata, and Java source code, capturing both structural and behavioral characteristics. The dataset uses binary labels, where 0 denotes benign applications and 1 denotes malicious applications. It is intended for training and evaluating machine learning and deep learning models for Android malware detection, and supports research on robust classification, generalization under distribution shift, and large-scale threat analysis.
Files
Steps to reproduce
Reverse engineering is performed on each APK to extract its underlying source representation. Features are derived from function call graphs, Android permissions, and Java source code, enabling both structural and behavioral characterization of applications. The dataset is organized into two classes: 140,722 benign Android applications and 124,701 malicious applications, with each instance uniquely identified using its SHA-256 hash and labeled using a binary scheme (0 for benign, 1 for malware).
Institutions
- GLA UniversityUttar Pradesh, Mathura