In the experiment of imbalanced problems, 50 imbalanced data sets from the Knowledge Extraction based on Evolutionary Learning (KEEL: http://www.keel.es/) are used in this paper. Every data set is a 5x3 cell with 5 rows and 3 columns. Every row corresponds to the data in one fold of the 5-folds cross-validation. The first column is the training data of minority class. The second column is the training data of majority class. The last column is the testing data. For all detailed sample in corresponding element, the last column is the label.
The dataset we simulated for our experiments in the "Long Short-Term Memory-Based Deep Recurrent Neural Networks for Target Tracking". It contains training dataset we used to train our networks and test data we used to obtain the results in the paper.
This dataset contains 48 features extracted from 5000 phishing webpages and 5000 legitimate webpages, which were downloaded from January to May 2015 and from May to June 2017. An improved feature extraction technique is employed by leveraging the browser automation framework (i.e., Selenium WebDriver), which is more precise and robust compared to parsing approach based on regular expressions. This dataset is WEKA-ready.
Phishing webpage source: PhishTank, OpenPhish
Legitimate webpage source: Alexa, Common Crawl
Anti-phishing researchers and experts may find this dataset useful for phishing features analysis, conducting rapid proof of concept experiments or benchmarking phishing classification models.