Generalized possibilistic fuzzy c-means with novel cluster validity indices for clustering noisy data
Description of this data
Thank you for using this code and datasets. I explain how GPFCM code related to my paper "Generalized possibilistic fuzzy c-means with novel cluster validity indices for clustering noisy data" published in Applied Soft Computing, works. The main datasets mentioned in the paper together with GPFCM code are included.
If there is any question, feel free to contact me at:
Guidelines for GPFCM algorithm:
- Open the file "GPFCM-Code" using MATLAB.
- DATA1 to DATA6 are the data sets we used in the paper. Each data set contains the data "yd", optimal value of ρ "ruopt" and number of clusters "C".
- In line 13 of the code, change the number in "DATA1" to the number of the desired data set. For example, to load DATA3, change "load DATA1" to "load DATA3".
- Click somewhere on the file "GPFCM-Code" and then Press "Ctrl+Enter" to run the code.
- VFCM, VPFCM, and VGPFCM which appear on the command window are cluster centers computed by each of the algorithms FCM, PFCM, and GPFCM, respectively. You can find all of them in the "Workspace" of MATLAB as well.
- Sometimes, PFCM may yield two or more coincident clusters for DATA4 or any other data. Then GPFCM will also give two or more coincident clusters because it starts with PFCM. You may run the algorithm again to get probably all cluster centers accurately. Generally, if you use GFCM rather than GPFCM, you'll get better results with no coincident clusters. Settings of the code for GFCM are mentioned in item 14.
- Since the algorithm starts randomly, order of the cluster centers may be different in various runs but numerical values of the cluster centers will not change. For example, if is obtained as the third cluster center in one run which is the third column of the matrix VGPFCM, it may move to the fifth row of the matrix in another run (if ). But its value would not considerably change and is very close to . This is just because of random initializations of the algorithm. Since FCM (by which GPFCM is initialized) is randomly initialized, sometimes it is sensitive to initialization (depending on the data) and there may be negligible differences between cluster centers obtained in different runs. For example, consider DATA3 with 6 clusters. In one run we get:
-4.9960 -1.0169 -4.9708 1.9575 1.0521 -2.0271
-1.9853 -5.0464 5.9470 0.0031 6.0183 1.9896
And in another run we have:
-4.9960 -1.0169 1.9575 1.0521 -2.0271 -4.9708
-1.9853 -5.0464 0.0031 6.0183 1.9896 5.9470
It is observed that cluster centers are the same as those of the first run but their positions in the matrix VGPFCM is changed.
- Line 46 computes Covariance norm matrix. If you "uncomment" line 47, then the program uses Identity norm matrix (Euclidean distance).
Experiment data files
This data is associated with the following publication:
Cite this dataset
Askari Lasaki, Salar; Askari Lasaki, Salar (2017), “Generalized possibilistic fuzzy c-means with novel cluster validity indices for clustering noisy data”, Mendeley Data, v1 http://dx.doi.org/10.17632/dgxfv4s5vt.1