A Comparative Study of Fast and Accurate Clustering Algorithms in Multi-Sized Data Sets

Syed Quddus; Adil Bagirov

doi:10.14738/tecs.122.14317

Authors

Syed Quddus University of the Bahamas UB│Oakes Field P.O. Box N-4912 │Nassau, Bahamas
Adil Bagirov Federation University, Ballarat Vic, Australia

DOI:

https://doi.org/10.14738/tecs.122.14317

Keywords:

Cluster Analysis, Data Mining, Algorithms

Abstract

Unsupervised learning or clustering in large data sets is a challenging problem. Most clustering algorithms are not efficient and accurate in such data sets. Therefore, development of clustering algorithms capable of solving clustering problems in large data sets is very important. In this paper, we present an overview of various algorithms and approaches which are recently being used for Clustering of large data and Edocument. We use the squared Euclidean norm to define the similarity measure. In this paper, a comparative study of the performance of various clustering algorithms: the global kmeans algorithm (GKM), the multi-start modified global kmeans algorithm (MSMGKM), the multi-start kmeans algorithm (MS-KM), the difference of convex clustering algorithm (DCA), the incremental clustering algorithm based on the difference of convex representation of the cluster function and non-smooth optimization (DC-L2), is carried out using Python. CCS Concepts: Information systems Data mining, Information systems, Data cleaning, Information systems Clustering.

A Comparative Study of Fast and Accurate Clustering Algorithms in Multi-Sized Data Sets

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

License

Most read articles by the same author(s)

sidebar

Author Center

indexing

Indexing

Keywords

Scholar Publishing

Our Journals

Useful Links