A Comparative Study of Fast and Accurate Clustering Algorithms in Multi-Sized Data Sets

Authors

  • Syed Quddus University of the Bahamas UB│Oakes Field P.O. Box N-4912 │Nassau, Bahamas
  • Adil Bagirov Federation University, Ballarat Vic, Australia

DOI:

https://doi.org/10.14738/tecs.122.14317

Keywords:

Cluster Analysis, Data Mining, Algorithms

Abstract

Unsupervised learning or clustering in large data sets is a challenging problem. Most clustering algorithms are not efficient and accurate in such data sets. Therefore, development of clustering algorithms capable of solving clustering problems in large data sets is very important. In this paper, we present an overview of various algorithms and approaches which are recently being used for Clustering of large data and Edocument. We use the squared Euclidean norm to define the similarity measure. In this paper, a comparative study of the performance of various clustering algorithms: the global kmeans algorithm (GKM), the multi-start modified global kmeans algorithm (MSMGKM), the multi-start kmeans algorithm (MS-KM), the difference of convex clustering algorithm (DCA), the incremental clustering algorithm based on the difference of convex representation of the cluster function and non-smooth optimization (DC-L2), is carried out using Python. CCS Concepts: Information systems Data mining, Information systems, Data cleaning, Information systems Clustering.

Downloads

Published

2024-04-13

How to Cite

Quddus, S., & Bagirov, A. (2024). A Comparative Study of Fast and Accurate Clustering Algorithms in Multi-Sized Data Sets. Transactions on Engineering and Computing Sciences, 12(2), 110–120. https://doi.org/10.14738/tecs.122.14317