Experimental Dataset for Imbalanced Classification: Application of Relabeling & Ranking Algorithm

Name: Experimental Dataset for Imbalanced Classification: Application of Relabeling & Ranking Algorithm
Creator: Seunghwan Park
Published: 2024-01-16T15:41:12.235Z
Keywords: Class Imbalance

Park, Seunghwan; Im, Jongho

doi:10.17632/pb3jd7vz9z.1

Experimental Dataset for Imbalanced Classification: Application of Relabeling & Ranking Algorithm

Published: 16 January 2024| Version 1 | DOI: 10.17632/pb3jd7vz9z.1

Contributors:

Seunghwan Park,

Description

The datasets in the study "Relabeling & Raking Algorithm for Imbalanced Classification" were sourced from several public repositories, including 1) Knowledge Extraction based on Evolutionary Learning data repository (J. Alcal´a-Fdez and A. Fernandez and J. Luengo and J. Derrac and S. Garc´ıa and L. S´anchez and F. Herrera, 16 2011), 2) UCI machine learning repository (Dua and Graff, 2017), 3) HDDT collection (Cieslak et al., 2012) and 4) previous studies (Radivojac et al., 2004; Kubat et al., 1998; WOODS et al., 1993). These datasets are particularly notable for their imbalanced nature and are widely recognized in academic literature for this feature. Two main criteria were used to select these datasets: Large-Scale Focus: Preference was given to large-scale datasets, a category often overlooked in previous studies. This selection includes datasets with more than 1,000 instances, with 10 of the 16 real-world datasets exceeding this threshold and four having over 10,000 instances. High Imbalance Ratio (IR): The primary focus was on highly imbalanced datasets, specifically those with an IR greater than 9. The datasets were categorized based on the types of feature variables they contain: Continuous datasets: All feature variables are continuous. Categorical datasets: All feature variables are categorical. Mixed datasets: A combination of continuous and categorical feature variables.

Files

Institutions

Kangwon National University
Yonsei University

Funders

National Research Foundation of Korea
South Korea
Grant ID: NRF-2019R1G1A1002232
National Research Foundation of Korea
South Korea
Grant ID: NRF-2021R1C1C1 014407
National Research Foundation of Korea
South Korea
Grant ID: NRF-2022R1A4A1033384

Experimental Dataset for Imbalanced Classification: Application of Relabeling & Ranking Algorithm

Description

Files

Institutions

Categories

Funders

Licence