OpenML study 7 - meta-datasets

Published: 4 November 2020| Version 1 | DOI: 10.17632/7xx7ty87x2.1
Contributor:
Ivan Olier

Description

From OpenML we retrieved data from an earlier meta-learning study (Details can be found on https://www.openml.org/s/7). Although we had to exclude a few tasks and algorithms because they lacked sufficient evaluations in OpenML, this yielded a set of 10840 evaluations on 351 tasks (datasets) and 53 machine learning methods (called flows on OpenML) from mlr (Bischl et al., 2016). From each task, 21 dataset descriptors were extracted, such as the number of examples, number of missing values, and percentage of numeric features. We formed meta-datasets, one for each machine learning method. An observation within a meta-dataset represents an original OpenML task, and each feature, a dataset descriptor. The original aim of the study was to predict the area under the ROC (AUC). Therefore, in total, we produced 53 meta-datasets with a diverse number of OpenML tasks, ranging from above 100 to about 250.

Files

Steps to reproduce

From OpenML we retrieved data from an earlier meta-learning study (Details can be found on https://www.openml.org/s/7). Although we had to exclude a few tasks and algorithms because they lacked sufficient evaluations in OpenML, this yielded a set of 10840 evaluations on 351 tasks (datasets) and 53 machine learning methods (called flows on OpenML) from mlr (Bischl et al., 2016). From each task, 21 dataset descriptors were extracted, such as the number of examples, number of missing values, and percentage of numeric features. We formed meta-datasets, one for each machine learning method. An observation within a meta-dataset represents an original OpenML task, and each feature, a dataset descriptor. The original aim of the study was to predict the area under the ROC (AUC). Therefore, in total, we produced 53 meta-datasets with a diverse number of OpenML tasks, ranging from above 100 to about 250.