k-Prototypes Clustering Algorithm

Published: 15 Feb 2020 | Version 2 | DOI: 10.17632/63nyn9tjcd.2

Description of this data

The functions used to carry out this work are found in the files provided, "k-Prototypes Clustering" and "clustMixType modified functions". These algorithms carry out the operations of obtaining and manipulating the data matrix, descriptive statistics of the data, determining the best number of clusters, clustering with the k-prototypes method, and statistical validation of the generated clusters with MANOVA. An example is also presented using the Iris database, contained in the R software library, and widely used to exemplify and validate algorithms developed in R language.

The functions modified for this work are found in the files "clustMixType modified functions". The modified functions are called in the algorithm of the file "k-Prototypes Clustering", on line 41, by the file "k-Prototypes Clustering.R".
The kproto.modif (), clprofiles.modif () and summary.kproto.modif () functions were modified from the kproto (), clprofiles () and summary.kproto () functions, respectively, of the clustMixType package, developed by SZEPANNEK (2018). The dist.binary () function of the ade4 package, developed by DRAY & DUFOUR (2017), was also used in the development of the kproto.modif () function, that now can use a variety of similarity functions. The relationship between the variables is expressed by the squared Euclidean distance, to quantify the distance between numerical variables, and for the nominal variables, the distance can be obtained from a variety of coefficients of similarity.
The fviz_cluster.modif () function was modified from the fviz_cluster () function of the factoextra package, developed by KASSAMBARA & MUNDT (2017).

REFERENCES:

  • DRAY, S.; DUFOUR, A.-B. The ade4 Package: Implementing the Duality Diagram for Ecologists. Journal of Statistical Software, v.22, n.4, p.1-20, set. 2017. R Package version 1.7-13. Available at: https://CRAN.R-project.org/package=ade4. https://www.doi.org/10.18637/jss.v022.i04.
  • KASSAMBARA, A.; MUNDT, F. factoextra: Extract and Visualize the Results of Multivariate Data Analyses. 2017. R Package version 1.0.5. Available at: https://CRAN.R-project.org/package=factoextra.
  • SZEPANNEK, G. clustMixType: User-Friendly Clustering of Mixed-Type Data in R. The R Journal, v.10, n.2, p.200-208, 2018. R Package version 0.2-1. Available at: https://CRAN.R-project.org/package=clustMixType. https://www.doi.org/10.32614/RJ-2018-048.

Experiment data files

Related links

Latest version

  • Version 2

    2020-02-15

    Published: 2020-02-15

    DOI: 10.17632/63nyn9tjcd.2

    Cite this dataset

    Delbem Vidigal Nazareth, Ana Flávia (2020), “k-Prototypes Clustering Algorithm”, Mendeley Data, v2 http://dx.doi.org/10.17632/63nyn9tjcd.2

Statistics

Views: 26
Downloads: 14

Previous versions

Compare to version

Institutions

Universidade Federal de Ouro Preto

Categories

Clustering, Mining, Geotechnics, Applied Statistics

Licence

CC BY 4.0 Learn more

The files associated with this dataset are licensed under a Creative Commons Attribution 4.0 International licence.

What does this mean?
You can share, copy and modify this dataset so long as you give appropriate credit, provide a link to the CC BY license, and indicate if changes were made, but you may not do so in a way that suggests the rights holder has endorsed you or your use of the dataset. Note that further permission may be required for any content within the dataset that is identified as belonging to a third party.

Report