StealthPhisher Phishing Attack Dataset

Published: 7 November 2025| Version 2 | DOI: 10.17632/m2479kmybx.2
Contributors:
Tanmay Jha, Harshit Goswami, Chirag Solanki, Dushyant Nagal, Vibhu Yadav,

Description

The StealthPhisher Phishing Attack Dataset, generated at the Cybersecurity Lab, GLA University, Mathura, is a large, diverse, and recent Phishing Attack Dataset developed to address the evolving nature of phishing attacks. It comprises over 336,749 records, including 160,943 legitimate URLs and 175,806 phishing URLs, collected from reliable sources such as PhishTank. Reflecting the most recent phishing tactics, this dataset serves as a valuable resource for training and evaluating AI-based phishing detection systems. Key features include URL-based attributes (e.g., length, TLD type, IP presence), statistical metrics (e.g., Shannon Entropy, Kolmogorov Complexity, Fractal Dimension), and HTML/interaction-based features (e.g., popups, redirects, forms). These multidimensional attributes provide comprehensive insights into phishing behavior, enabling accurate and robust threat detection. Designed to capture real-world scenarios, the dataset equips AI models to recognize both traditional and emerging phishing strategies effectively. This dataset was generated as part of the research work presented in the article “StealthPhisher: A Defensive Framework against Phishing Attack using Hybrid Deep Learning and GenAI,” published in Expert Systems with Applications (https://doi.org/10.1016/j.eswa.2025.130205). Researchers using this dataset in their research work are kindly requested to cite this article.

Files

Steps to reproduce

Please refer to the detailed methodology described in the article https://doi.org/10.1016/j.eswa.2025.130205

Institutions

GLA University Institute of Engineering and Technology

Categories

Cybersecurity, Machine Learning, Deep Learning, Cyber Attack

Licence