Datasets Comparison

Version 1

StealthPhisher

Published:15 January 2025|Version 1|DOI:10.17632/m2479kmybx.1

Contributors:Tanmay Jha, Harshit Goswami, Chirag Solanki, Dushyant Nagal, Vibhu Yadav,

Description

The StealthPhisher ~~dataset~~ is a large, diverse, and ~~up-~~to~~-date resource tailored to~~ address the evolving nature of phishing attacks. It ~~contains~~ over 336,749 records, ~~comprising~~ 160,943 legitimate URLs and 175,806 phishing URLs, ~~sourced~~ from ~~platforms like~~ PhishTank~~, spam email repositories, and user submissions~~. ~~This dataset reflects~~ recent phishing tactics, ~~making it invaluable~~ for training AI ~~models to detect modern threats~~. Key features include URL-based attributes (length, TLD type, IP presence), statistical metrics (Shannon Entropy, Kolmogorov Complexity, Fractal Dimension), and HTML/interaction-based ~~data~~ (popups, redirects, forms). These ~~features~~ provide comprehensive insights into phishing ~~behaviors~~, enabling ~~precise~~ detection. Designed to capture real-world scenarios, the dataset equips AI models ~~with the ability~~ to ~~identify~~ both traditional phishing strategies and ~~advanced~~, ~~evolving attacks~~. ~~Its scale and focus on recent trends make it an essential tool for advancing AI-driven cybersecurity solutions~~.

Licence

Creative Commons Attribution 4.0 International

Version 2

StealthPhisher Phishing Attack Dataset

Published:7 November 2025|Version 2|DOI:10.17632/m2479kmybx.2

Contributors:Tanmay Jha, Harshit Goswami, Chirag Solanki, Dushyant Nagal, Vibhu Yadav,

Description

The StealthPhisher Phishing Attack Dataset, generated at the Cybersecurity Lab, GLA University, Mathura, is a large, diverse, and recent Phishing Attack Dataset developed to address the evolving nature of phishing attacks. It comprises over 336,749 records, including 160,943 legitimate URLs and 175,806 phishing URLs, collected from reliable sources such as PhishTank. Reflecting the most recent phishing tactics, this dataset serves as a valuable resource for training and evaluating AI-based phishing detection systems. Key features include URL-based attributes (e.g., length, TLD type, IP presence), statistical metrics (e.g., Shannon Entropy, Kolmogorov Complexity, Fractal Dimension), and HTML/interaction-based features (e.g., popups, redirects, forms). These multidimensional attributes provide comprehensive insights into phishing behavior, enabling accurate and robust threat detection. Designed to capture real-world scenarios, the dataset equips AI models to recognize both traditional and emerging phishing strategies effectively. This dataset was generated as part of the research work presented in the article “StealthPhisher: A Defensive Framework against Phishing Attack using Hybrid Deep Learning and GenAI,” published in Expert Systems with Applications (https://doi.org/10.1016/j.eswa.2025.130205). Researchers using this dataset in their research work are kindly requested to cite this article.

Steps to reproduce

Please refer to the detailed methodology described in the article https://doi.org/10.1016/j.eswa.2025.130205

Institutions

GLA University Institute of Engineering and Technology

Licence

Creative Commons Attribution 4.0 International

Datasets Comparison

Version 1

StealthPhisher

Description

Categories

Licence

Version 2

StealthPhisher Phishing Attack Dataset

Description

Steps to reproduce

Institutions

Institutions

Categories

Related Links

Licence