simulated data for clustering experiments

Published: 16 Apr 2018 | Version 1 | DOI: 10.17632/mvwdtmzhcw.1

Description of this data

182 simulated datasets (first set contains small datasets and second set contains large datasets) with different cluster compositions – i.e., different number clusters and separation values – generated using clusterGeneration package in R. Each set of simulation datasets consists of 91 datasets in comma separated values (csv) format (total of 182 csv files) with 3-15 clusters and 0.1 to 0.7 separation values. Separation values can range between (−0.999, 0.999), where a higher separation value indicates cluster structure with more separable clusters.

Size of the dataset, number of clusters, and separation value of the clusters in the dataset is printed in file name. size_X_n_Y_sepval_Z.csv:
Size of the dataset = X
number of clusters in the dataset = Y
separation value of the clusters in the dataset = Z

Experiment data files

  • data
  • plots

Latest version

  • Version 1


    Published: 2018-04-16

    DOI: 10.17632/mvwdtmzhcw.1

    Cite this dataset

    Estiri, Hossein (2018), “simulated data for clustering experiments”, Mendeley Data, v1


Views: 2906
Downloads: 805


Unsupervised Learning, Cluster Analysis, Partitioning, Cluster Testing


CC BY 4.0 Learn more

The files associated with this dataset are licensed under a Creative Commons Attribution 4.0 International licence.

What does this mean?
You can share, copy and modify this dataset so long as you give appropriate credit, provide a link to the CC BY license, and indicate if changes were made, but you may not do so in a way that suggests the rights holder has endorsed you or your use of the dataset. Note that further permission may be required for any content within the dataset that is identified as belonging to a third party.