Data for: kluster: An Efficient Scalable Procedure for Approximating the Number of Clusters in Unsupervised Learning

Published: 19 Jun 2018 | Version 1 | DOI: 10.17632/vfx46vcwpp.1

Description of this data

182 simulated datasets (first set contains small datasets and second set contains large datasets) with different cluster compositions – i.e., different number clusters and separation values – generated using clusterGeneration package in R. Each set of simulation datasets consists of 91 datasets in comma separated values (csv) format (total of 182 csv files) with 3-15 clusters and 0.1 to 0.7 separation values. Separation values can range between (−0.999, 0.999), where a higher separation value indicates cluster structure with more separable clusters.

Size of the dataset, number of clusters, and separation value of the clusters in the dataset is printed in file name. size_X_n_Y_sepval_Z.csv:
Size of the dataset = X
number of clusters in the dataset = Y
separation value of the clusters in the dataset = Z

Experiment data files

This data is associated with the following publication:

kluster: An Efficient Scalable Procedure for Approximating the Number of Clusters in Unsupervised Learning

Published in: Big Data Research

Latest version

  • Version 1

    2018-06-19

    Published: 2018-06-19

    DOI: 10.17632/vfx46vcwpp.1

    Cite this dataset

    Estiri, Hossein; Abounia Omrn, Behzad; Murphy, Shawn (2018), “Data for: kluster: An Efficient Scalable Procedure for Approximating the Number of Clusters in Unsupervised Learning ”, Mendeley Data, v1 http://dx.doi.org/10.17632/vfx46vcwpp.1

Statistics

Views: 171
Downloads: 62

Categories

Data Science, Machine Learning, Cluster Analysis, Clinical Research Informatics

Licence

CC BY NC 3.0 Learn more

The files associated with this dataset are licensed under a Attribution-NonCommercial 3.0 Unported licence.

What does this mean?

You are free to adapt, copy or redistribute the material, providing you attribute appropriately and do not use the material for commercial purposes.

Report