Gaussian Mixture high-dimensional datasets

Published: 25 June 2020| Version 1 | DOI: 10.17632/xj9vyybgzn.1
Contributor:
Fernanda Eustáquio

Description

Seven sets of c Gaussian shaped clustered datasets. For each dataset, n points with p dimensions were generated from a mixture of c Gaussian distributions (clusters). The means of each cluster were randomly generated with numbers between 0 and 10 and they were structured in a c x p matrix. The standard deviations of each cluster are represented by a p x p covariance matrix generated by a normally distributed random numbers. After all, from the matrix of means c x p and the c covariance matrices p x p, the mvrnorm function of the MASS library of the R software was used to produce the samples that composes the Gaussian mixture dataset. The properties of the seven Gaussian mixture datasets: Dataset | p | n | c Gaussian.k8 | 657 | 81 | 8 Gaussian.k2 | 4,232 | 181 | 2 Gaussian.k7 | 4,514 | 128 | 7 Gaussian.k9 | 5,041 | 108 | 9 Gaussian.k6 | 5,176 | 143 | 6 Gaussian.k5 | 6,203 | 130 | 5 Gaussian.k4 | 6,615 | 168 | 4 The cluster label of each object is in the last column of the dataset. Each column is separated by a comma and there are not columns and row names.

Files