Clustering Practices in Missing Value Data Sets

Authors

  • Serpil SEVİMLİ DENİZ Van Yüzüncü Yıl University, Gevaş Vocational School, Computer Programming Department
  • H. Eray ÇELİK Van Yüzüncü Yıl University, Faculty of Economics and Administrative Sciences, Department of Econometrics
  • Çağdaş Hakan ALADAĞ Hacettepe University, Faculty of Science. Department of Statistics

DOI:

https://doi.org/10.38063/ejons.365

Keywords:

Missing Data, clustering, SOM, LVQ, k-means

Abstract

Missing data is when one or more values cannot be obtained in the data sets. The purpose of cluster analysis is to provide summary information to the researcher by classifying the data according to their similarities and to reduce the number of data that is too much to less. In this study, the performances of the three clustering methods are compared using different missing data rates in eleven separate data sets consisting of numerical and nominal data. The correct clustering rates of the data were examined by decreasing the data at five percent, ten percent, fifteen percent, twenty percent, twenty five percent and thirty percent of the data sets completely and randomly. The methods whose working performance were tested using missing data are k-means, one of partitioned clustering methods and self-organizing maps, one of artificial neural network-based clustering methods - Self Organization Map (SOM) and linear vector segmentation model - Learning Vector Quantization (LVQ). According to the results of the analysis; it is observed that as the missing data rate increases, the correct cluster rate decreases. It was observed that the LVQ method performed better in four data sets with two sets of nominal and numerical data, while the SOM method performed better clustering in the other seven data sets consisting of numerical data.

Published

2020-12-15

How to Cite

SEVİMLİ DENİZ, S., ÇELİK, H. E., & ALADAĞ, Çağdaş H. (2020). Clustering Practices in Missing Value Data Sets. Ejons International Journal on Mathematic, Engineering and Natural Sciences, 4(16), 998–1004. https://doi.org/10.38063/ejons.365