Data for: Clustering Benchmark Datasets Exploiting the Fundamental Clustering Problems

Published: 26 April 2021| Version 1 | DOI: 10.17632/vsxvgc4rwy.1
Michael Thrun,


This directory contains the Fundamental Clustering Challenges. It contains different data sets with a given classification. The data sets are rather simple and low in dimensionality. Each dataset adresses a certain problem for clustering. Standard clustering algorithms such as single linkage, ward and k-means are not able to cluster all of the data sets correctly. These datasets may serve as a minimal test for new invented cluster algorithms. Every new invented algorithm should at least cluster the FCPS correctly. Datasets of dimensionality 3 or higher can also serve to validate dimensionality reduction methods of the type which attempts to visualize information by means of projections that are restricted to visualizing data in a two-dimensional space while preserving their structure. Conventional projection methods, like MDS or PCA, are not able to visualize the cluster structures correctly. All files are ASCII text files. Colums are separated by TAB. Headers are included. *.lrn files contain the data including a unique key for each case, *.cls contain keys and class labels. Each class is indicated by a positive number. *.lrn the data including a unique key for each case *.cls contains key and class. Each class is indicated ba a positive number. Data can be opend with with a *.csv reader, or using the R package FCPS on CRAN:



Clustering, Dimensionality Reduction, Projection Method, Pattern Recognition