k-Prototypes Clustering Algorithm

Published: 15 February 2020| Version 2 | DOI: 10.17632/63nyn9tjcd.2
Ana Flávia Delbem Vidigal Nazareth


The functions used to carry out this work are found in the files provided, "k-Prototypes Clustering" and "clustMixType modified functions". These algorithms carry out the operations of obtaining and manipulating the data matrix, descriptive statistics of the data, determining the best number of clusters, clustering with the k-prototypes method, and statistical validation of the generated clusters with MANOVA. An example is also presented using the Iris database, contained in the R software library, and widely used to exemplify and validate algorithms developed in R language. The functions modified for this work are found in the files "clustMixType modified functions". The modified functions are called in the algorithm of the file "k-Prototypes Clustering", on line 41, by the file "k-Prototypes Clustering.R". The kproto.modif (), clprofiles.modif () and summary.kproto.modif () functions were modified from the kproto (), clprofiles () and summary.kproto () functions, respectively, of the clustMixType package, developed by SZEPANNEK (2018). The dist.binary () function of the ade4 package, developed by DRAY & DUFOUR (2017), was also used in the development of the kproto.modif () function, that now can use a variety of similarity functions. The relationship between the variables is expressed by the squared Euclidean distance, to quantify the distance between numerical variables, and for the nominal variables, the distance can be obtained from a variety of coefficients of similarity. The fviz_cluster.modif () function was modified from the fviz_cluster () function of the factoextra package, developed by KASSAMBARA & MUNDT (2017). REFERENCES: - DRAY, S.; DUFOUR, A.-B. The ade4 Package: Implementing the Duality Diagram for Ecologists. Journal of Statistical Software, v.22, n.4, p.1-20, set. 2017. R Package version 1.7-13. Available at: https://CRAN.R-project.org/package=ade4. https://www.doi.org/10.18637/jss.v022.i04. - KASSAMBARA, A.; MUNDT, F. factoextra: Extract and Visualize the Results of Multivariate Data Analyses. 2017. R Package version 1.0.5. Available at: https://CRAN.R-project.org/package=factoextra. - SZEPANNEK, G. clustMixType: User-Friendly Clustering of Mixed-Type Data in R. The R Journal, v.10, n.2, p.200-208, 2018. R Package version 0.2-1. Available at: https://CRAN.R-project.org/package=clustMixType. https://www.doi.org/10.32614/RJ-2018-048.



Universidade Federal de Ouro Preto


Clustering, Mining, Geotechnics, Applied Statistics