A Novel Approach to Distributed Multi-Class SVM
AbstractWith data sizes constantly expanding, and with classical machine learning algorithms that analyze such data requiring larger and larger amounts of computation time and storage space, the need to distribute computation and memory requirements among several computers has become apparent. Although substantial work has been done in developing distributed binary SVM algorithms and multi-class SVM algorithms individually, the field of multi-class distributed SVMs remains largely unexplored. This research proposes a novel algorithm that implements the Support Vector Machine over a multi-class dataset and is efficient in a distributed environment (here, Hadoop). The idea is to divide the dataset into half recursively and thus compute the optimal Support Vector Machine for this half during the training phase, much like a divide and conquer approach. While testing, this structure has been effectively exploited to significantly reduce the prediction time. Our algorithm has shown better computation time during the prediction phase than the traditional sequential SVM methods (One vs. One, One vs. Rest) and out-performs them as the size of the dataset grows. This approach also classifies the data with higher accuracy than the traditional multi-class algorithms.
. U. H.G. Krebel Pairwise classification and support vector machines. In B.Scholkopf, C.J.C. Burges, and A.J.Smola, editors, Advances in Kernel methods: Support Vector Learning, pages 255-268.The MIT Press,Cambridge,MA,1999
. J.C.Platt, N.Cristianni and J.Shawe-Taylor.Large margin DAGs for multiclass classification. In S.A.Solla, T.K.Leen, and K.R.Muller, editors Advances in Neural Information processing systems 12,pages 547-553.The MIT Press,2000.
. M. Pontil and A.Verri. Support Vector machines for 3-d object recognition IEEE Transactions On Pattern Analysis and Machine Intelligence,20(6) : 637-646, 1998.
. B.Kijsirikul and N. Ussikavul Multiclass support vector machines using adaptive directed acyclic graph. In Proceedings of International Joint Conference on Neural Networks (IJCNN 2002), pages 980-985, 2002.
. Mahesh Pal, Multiclass Approached for Support Vector Machine Based Land Cover Classification. NIT Kurukshetra
. Zhanquan Sun1, Geoffrey Fox2 .Study on Parallel SVM Based on MapReduce Key Laboratory for Computer Network of Shandong Province, Shandong Computer Science Center, School of Informatics and Computing, Pervasive Technology Institute, Indiana University Bloomington
. J X Dong, A Krzyzak, C Y Suen. “A fast Parallel Optimization for Training Support Vector Machine.” Proceedings of 3rd International Conference on Machine Learning and Data Mining, 2003: 96-105.
. Hans Peter Graf, Eric Cosatto, Leon Bottou, Igor Durdanovic, Vladimir Vapnik Parallel Support Vector Machines: The Cascade SVM .NEC Laboratories.
. R. Collobert, Y. Bengio, S. Bengio, A Parallel Mixture of SVMs for Very Large Scale Problems. in Neural Information Processing Systems, Vol. 17, MIT Press, 2004.
. Lu Y, Roychowdhury V, Vandenberghe L. Distributed Parallel Support Vector Machines in Strongly Connected Networks. IEEE Transactions on Neural Networks 2008; 7: pp. 1167 - 1178
. Ferhat Özgür Çatak1, Mehmet Erdal Balaban A MapReduce based distributed SVM algorithm for binary classification National Research Institute of Electronics and Cryptology, TUBITAK, Turkey
. C C Chang, C J Lin. LIBSVM: a library for support vector machines ACM Transactions on Intelligent Systems and Technology, 2011, 27(2): 1-27.
. Edward Y. Chang¤, Kaihua Zhu, HaoWang, Hongjie Bai, Jian Li, Zhihuan Qiu, & Hang Cui PSVM: Parallelizing Support Vector Machines on Distributed Computers,Google Research, Beijing, China