Published: 5 October 2021| Version 1 | DOI: 10.17632/ttmmtsgbs8.1


PhishRepo is implemented to fill the data gap in the anti-phishing domain, and it is still at an experimental level. PhishRepo collects the data available here during its testing stage, and the dataset includes verified phishing webpages. Therefore, it contains few data points only. The provided dataset contains diverse information sources collected related to the latest phishing pages. The diverse feature-rich data present in the dataset is a current need in the machine learning-based anti-phishing domain to overcome inept learning models in phishing detection. The dataset can be used to analyse significant phishing features, experiment with different feature extraction techniques, effectively try out some representation learning techniques such as deep learning from these raw data at a practical level. The dataset contains an index.csv file, and it will be the main file that should be used when mapping index file content with available folders. Generally, a folder should contain a webpage.html, alexa.xml, response.csv, screenshot.png and fullview.png files and src folder, which carries offline webpage resources. If something is missing in the folder level, that indicates in the index.csv file.


Steps to reproduce

The dataset can be downloaded from the PhishRepo data repository.


University of Moratuwa, Uva Wellassa University


Artificial Intelligence, Data Science, Applied Computing, Machine Learning