Image Category Recognition using Bag of Visual Words Representation
AbstractImage category recognition is one of the challenging tasks due to difference in image background, illumination, scale, clutter, rotation, etc. Bag-of-Visual-Words (BoVW) model is considered as the standard approach for image categorization. The performance of the BoVW is mainly depend on local features extracted from images. In this paper, a novel BoVW representation approach utilizing Compressed Local Retinal Features (CLRF) for image categorization is proposed. The CLRF uses interest point regions from images and transform them to log polar form. Then two dimensional Discrete Wavelet Transformation (2D DWT) is applied to compress the log polar form and the resultant are considered as features for the interest regions. These features are further used to build a visual vocabulary using k-means clustering algorithm. Then this visual vocabulary is used to form a histogram representation of each image where the images are further classified using Support Vector Machines (SVM) classifier. The performance of the proposed BoVW framework is evaluated using SIMPLIcity and butterflies datasets. The experimental results show that the proposed BoVW approach that uses CLRF is very competitive to the state-of-the-art methods.
(1) Yang, J, et., Evaluating bag-of-visual-words representations in scene classification. In Proceedings of the international workshop on Workshop on multimedia information retrieval, 2007. p. 197-206.
(2) Tirilly, P., Claveau, V., and Gros, P., Language modeling for bag-of-visual words image categorization. In Proceedings of the 2008 international conference on Content-based image and video retrieval, 2008. p. 249-258.
(3) Yang, Y., and Newsam, S., Bag-of-visual-words and spatial extensions for land-use classification. In Proceedings of the 18th SIGSPATIAL international conference on advances in geographic information systems, 2010. p. 270-279.
(4) Sivic, J., and Zisserman, A., Video Google: A text retrieval approach to object matching in videos. In Computer Vision, 2003. Proceedings. Ninth IEEE International Conference on, 2003. p. 1470-1477.
(5) Cortes, C., and Vapnik, V., Support-vector networks. Machine learning, 1995, 20(3): p. 273-297.
(6) Lowe, D. G., Distinctive image features from scale-invariant keypoints. International journal of computer vision, 2004, 60(2): p. 91-110.
(7) Kannan, R, et al., CLRF: Compressed Local Retinal Features for Image Description. In Advances in Pattern Recognition (ICAPR), Eighth International Conference on, 2015, p. 1-5.
(8) Bay, H., Tuytelaars, T., and Van Gool, L. Surf: Speeded up robust features. In European conference on computer vision, 2006. p. 404-417.
(9) Fan, B., Wu, F., and Hu, Z. Rotationally invariant descriptors using intensity order pooling. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012. 34(10): p.2031-2045.
(10) Peng, X., et al., Bag of visual words and fusion methods for action recognition: Comprehensive study and good practice. Computer Vision and Image Understanding, 2016. p. 109-125.
(11) Karakasis, E. G., et al., Image moment invariants as local features for content based image retrieval using the bag-of-visual-words model. Pattern Recognition Letters, 2015, 55, p. 22-27.
(12) Pun, C. M., and Lee, M. C. Log-polar wavelet energy signatures for rotation and scale invariant texture classification. IEEE transactions on pattern analysis and machine intelligence, 2003, 25(5): p.590-603.
(13) Kavukcuoglu, K., et al., Learning convolutional feature hierarchies for visual recognition. In Advances in neural information processing systems, 2010, p. 1090-1098.
(14) Lazebnik, S., Schmid, C., and Ponce, J., Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2006. 2: p. 2169-2178.
(15) Goh, H., et al., Unsupervised and supervised visual codes with restricted boltzmann machines. In European Conference on Computer Vision, 2012. p. 298-311.
(16) Vedaldi, A., et., Multiple kernels for object detection. In 2009 IEEE 12th international conference on computer vision, 2009. p. 606-613.
(17) Perronnin, F., and Dance, C., Fisher kernels on visual vocabularies for image categorization. IEEE Conference on Computer Vision and Pattern Recognition, 2007. p. 1-8.
(18) Sanchez, J., et al., Image classification with the fisher vector: Theory and practice. International journal of computer vision, 2013. 105(3): p. 222-245.
(19) Zhou, X., et al., Image classification using super-vector coding of local image descriptors. In European conference on computer vision, 2010. p. 141-154.
(20) Jegou, H., et., Aggregating local descriptors into a compact image representation. In Computer Vision and Pattern Recognition, 2010 IEEE Conference on, p. 3304-3311.
(21) Arandjelovic, R., and Zisserman, A. All about VLAD. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2013. p. 1578-1585.
(22) Zhu, C., Bichot, C. E., and Chen, L. Image region description using orthogonal combination of local binary patterns enhanced with color information. Pattern Recognition, 2013. 46(7): p.1949-1963.
(23) Ojala, T., Pietikainen, M., and Harwood, D. A comparative study of texture measures with classification based on featured distributions. Pattern recognition, 1996. 29(1): p. 51-59.
(24) Heikkila, M., Pietikainen, M., and Schmid, C. Description of interest regions with local binary patterns. Pattern recognition, 2009. 42(3): p. 425-436.
(25) Mikolajczyk, K., et al., A comparison of affine region detectors. International journal of computer vision, 2005. 65(1-2): p.43-72.
(26) Wang, J. Z., Li, J., and Wiederhold, G. SIMPLIcity: Semantics-sensitive integrated matching for picture libraries. Pattern analysis and machine intelligence, IEEE Transactions on, 2001. 23(9): p.947-963.
(27) Lazebnik, S., Schmid, C., and Ponce, J. Semi-local affine parts for object recognition. In British Machine Vision Conference, 2004. p. 779-788.