Zieni dataset for Phishing detection

Published: 4 September 2024| Version 1 | DOI: 10.17632/8mcz8jsgnb.1
Contributor:
Rasha Zieni

Description

This dataset was used for training machine learning models to detect phishing attacks and for studying the explainability of these models. It was published in 2024. The dataset refers to phishing and legitimate websites. Phishing samples have been collected from two sources, namely, PhishTank and Tranco, whereas legitimate samples were collected from Alexa. The dataset is balanced and contains 5,000 phishing and 5,000 legitimate samples, each described by 74 features extracted from the entire URL as well as from the Fully Qualified Domain Name, pathname, filename, and parameters. Of these features, 70 are numerical and four binary. The target variable is also binary.

Files

Categories

Cybersecurity, Machine Learning, Explainable Artificial Intelligence

Licence