Skip to main content
Exit comparison
Removed
Added

Datasets Comparison

Version 1

StealthPhisher

Published:15 January 2025|Version 1|DOI:10.17632/m2479kmybx.1
Contributors:Tanmay Jha, Harshit Goswami, Chirag Solanki, Dushyant Nagal, Vibhu Yadav,

Description

The StealthPhisher dataset is a large, diverse, and up-to-date resource tailored to address the evolving nature of phishing attacks. It contains over 336,749 records, comprising 160,943 legitimate URLs and 175,806 phishing URLs, sourced from platforms like PhishTank, spam email repositories, and user submissions. This dataset reflects recent phishing tactics, making it invaluable for training AI models to detect modern threats. Key features include URL-based attributes (length, TLD type, IP presence), statistical metrics (Shannon Entropy, Kolmogorov Complexity, Fractal Dimension), and HTML/interaction-based data (popups, redirects, forms). These features provide comprehensive insights into phishing behaviors, enabling precise detection. Designed to capture real-world scenarios, the dataset equips AI models with the ability to identify both traditional phishing strategies and advanced, evolving attacks. Its scale and focus on recent trends make it an essential tool for advancing AI-driven cybersecurity solutions.

Categories

Cybersecurity, Machine Learning, Deep Learning, Cyber Attack

Licence

Creative Commons Attribution 4.0 International

Version 2

StealthPhisher Phishing Attack Dataset

Published:7 November 2025|Version 2|DOI:10.17632/m2479kmybx.2
Contributors:Tanmay Jha, Harshit Goswami, Chirag Solanki, Dushyant Nagal, Vibhu Yadav,

Description

The StealthPhisher Phishing Attack Dataset, generated at the Cybersecurity Lab, GLA University, Mathura, is a large, diverse, and recent Phishing Attack Dataset developed to address the evolving nature of phishing attacks. It comprises over 336,749 records, including 160,943 legitimate URLs and 175,806 phishing URLs, collected from reliable sources such as PhishTank. Reflecting the most recent phishing tactics, this dataset serves as a valuable resource for training and evaluating AI-based phishing detection systems. Key features include URL-based attributes (e.g., length, TLD type, IP presence), statistical metrics (e.g., Shannon Entropy, Kolmogorov Complexity, Fractal Dimension), and HTML/interaction-based features (e.g., popups, redirects, forms). These multidimensional attributes provide comprehensive insights into phishing behavior, enabling accurate and robust threat detection. Designed to capture real-world scenarios, the dataset equips AI models to recognize both traditional and emerging phishing strategies effectively. This dataset was generated as part of the research work presented in the article “StealthPhisher: A Defensive Framework against Phishing Attack using Hybrid Deep Learning and GenAI,” published in Expert Systems with Applications (https://doi.org/10.1016/j.eswa.2025.130205). Researchers using this dataset in their research work are kindly requested to cite this article.

Steps to reproduce

Please refer to the detailed methodology described in the article https://doi.org/10.1016/j.eswa.2025.130205

Institutions

Institutions

GLA University Institute of Engineering and Technology

Categories

Cybersecurity, Machine Learning, Deep Learning, Cyber Attack

Related Links

Licence

Creative Commons Attribution 4.0 International