Binary-Classification Performance Metric-Spaces Data

Published: 12 August 2020| Version 2 | DOI: 10.17632/64r4jr8c88.2
Gürol Canbek, Tugba Taskaya Temizel, Seref Sagiroglu


Metric-Space is a proposed concept by Gürol Canbek et al (2019). A metric-space indicates all possible permutations of contingency table (or confusion matrix) elements yielding the same sample size (Sn). It holds all possible results of a hypothetical classification conducted in a dataset with a given sample size in terms of one or more performance metrics (e.g. Accuracy, F1, or TPR). Metric-space provides a pseudo-universal space to analyze and compare metrics in complete coverage. The formal definition and the details are given in the article. Each data file has the following performance 13 measures and 13 metrics: * True Positive (TP), False Positive (FP), False Negative (FN), True Negative (TN), Positive (P), Negative (N), Outcome Positive (OP), Outcome Negative (ON), True Classification (TC), False Classification (FC), Sample Size (Sn), Prevalence (PREV), Bias (BIAS) * True Positive Rate (TPR), True Negative Rate (TNR), Positive Predictive Value (PPV), Negative Predictive Value (NPV), Accuracy (ACC), Informedness (INFORM), Markedness (MARK), Balanced Accuracy (BACC), G, Normalized Mutual Information (nMI), F1, Cohen’s Kappa (CK), and Mathews Correlation Coefficient (MCC) Each data file belongs to metric-spaces for different Sn values (10, 25, 50, 75, 100, 125, 150, 175, 200, 225). The file format is RData (compatible with The R Project for Statistical Computing) instead of CSV (comma separated values) because of large CSV file sizes. Therefore, MATLAB users should convert the files into CSV and save them in R: > load('MetricSpaces_Sn_10.RData') > metric_spaces_Sn_10 <- data.frame(TP, FP, FN, TN, P, N, OP, ON, TC, FC, Sn, PREV, BIAS, TPR, TNR, PPV, NPV, ACC, INFORM, MARK, BACC, G, nMI, F1, CK, MCC) > write.csv(metric_spaces_Sn_10, file='MetricSpaces_Sn_10.csv') Note that metric-space sizes (permutations) increase exponentially: Sn=25 (3,276); Sn=50 (23,426); Sn=75 (76,076); Sn=100 (176,851); Sn=125 (341,376); Sn=150 (585,276); Sn=175 (924,176); Sn=200 (1,373,701); Sn=250 (2,667,126).



Machine Learning, Machine Learning Theory, Contingency Table Analysis, Performance Measurement, Accuracy Analysis, Classification (Machine Learning), Classifier Evaluation