Clustering Practices in Missing Value Data Sets
DOI:
https://doi.org/10.38063/ejons.365Keywords:
Missing Data, clustering, SOM, LVQ, k-meansAbstract
Missing data is when one or more values cannot be obtained in the data sets. The purpose of cluster analysis is to provide summary information to the researcher by classifying the data according to their similarities and to reduce the number of data that is too much to less. In this study, the performances of the three clustering methods are compared using different missing data rates in eleven separate data sets consisting of numerical and nominal data. The correct clustering rates of the data were examined by decreasing the data at five percent, ten percent, fifteen percent, twenty percent, twenty five percent and thirty percent of the data sets completely and randomly. The methods whose working performance were tested using missing data are k-means, one of partitioned clustering methods and self-organizing maps, one of artificial neural network-based clustering methods - Self Organization Map (SOM) and linear vector segmentation model - Learning Vector Quantization (LVQ). According to the results of the analysis; it is observed that as the missing data rate increases, the correct cluster rate decreases. It was observed that the LVQ method performed better in four data sets with two sets of nominal and numerical data, while the SOM method performed better clustering in the other seven data sets consisting of numerical data.
Downloads
Published
How to Cite
Issue
Section
License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.