OpenML study 7 - meta-datasets
Description
From OpenML we retrieved data from an earlier meta-learning study (Details can be found on https://www.openml.org/s/7). Although we had to exclude a few tasks and algorithms because they lacked sufficient evaluations in OpenML, this yielded a set of 10840 evaluations on 351 tasks (datasets) and 53 machine learning methods (called flows on OpenML) from mlr (Bischl et al., 2016). From each task, 21 dataset descriptors were extracted, such as the number of examples, number of missing values, and percentage of numeric features. We formed meta-datasets, one for each machine learning method. An observation within a meta-dataset represents an original OpenML task, and each feature, a dataset descriptor. The original aim of the study was to predict the area under the ROC (AUC). Therefore, in total, we produced 53 meta-datasets with a diverse number of OpenML tasks, ranging from above 100 to about 250.
Files
Steps to reproduce
From OpenML we retrieved data from an earlier meta-learning study (Details can be found on https://www.openml.org/s/7). Although we had to exclude a few tasks and algorithms because they lacked sufficient evaluations in OpenML, this yielded a set of 10840 evaluations on 351 tasks (datasets) and 53 machine learning methods (called flows on OpenML) from mlr (Bischl et al., 2016). From each task, 21 dataset descriptors were extracted, such as the number of examples, number of missing values, and percentage of numeric features. We formed meta-datasets, one for each machine learning method. An observation within a meta-dataset represents an original OpenML task, and each feature, a dataset descriptor. The original aim of the study was to predict the area under the ROC (AUC). Therefore, in total, we produced 53 meta-datasets with a diverse number of OpenML tasks, ranging from above 100 to about 250.