Survey on Object Tracking using Deep Learning Paradigms
Keywords:Object Tracking, Deep Learning, CNN, RNN, Autoencoders
The field of object tracking extends across different domains. It is a major key player in the field of image processing and pattern recognition. Object tracking is the process of tracking an object over a continuous sequence of image frames to determine over time the relative movements or changes. With the massive advancements in the field of deep learning, the use of deep neural networks has risen due to their impressive accomplishments in object detection and tracking. In this Survey, the objective is to give a comprehensive overview of the recent attempts in the field of object tracking with a focus on the use of deep learning techniques and algorithms. The paper is divided into four sections; at first, we will give an overview of the recent work to highlight the techniques and methods which have been used in object tracking using deep learning. The second section focus is on the object tracking that uses convolutional networks techniques. The third section focuses on some of the recurrent neural networks to tack objects. The final section is concentrated on auto-encoders object tracking.
(1) Nikolaos Doulamis. 2018. Adaptable deep learning structures for object labeling/tracking under dynamic visual environments. Multimedia Tools Appl. 77, 8 (April 2018), 9651-9689. DOI: https://doi.org/10.1007/s11042-017-5349-7
(2) X. Zhou, L. Xie, P. Zhang and Y. Zhang, "An ensemble of deep neural networks for object tracking," 2014 IEEE International Conference on Image Processing (ICIP), Paris, 2014, pp. 843-847.
(3) Chen, Yan & Yang, Xiangnan & Zhong, Bineng & Pan, Shengnan & Chen, Duansheng & Zhang, Huizhen. (2015). CNNTracker: Online discriminative object tracking via deep convolutional neural network. Applied Soft Computing. 38. 10.1016/j.asoc.2015.06.048.
(4) Liu, Qiao & Lu, Xiaohuan & He, Zhenyu & Zhang, Chunkai & Chen, Wen-Sheng. (2017). Deep Convolutional Neural Networks for Thermal Infrared Object Tracking. Knowledge-Based Systems. Knowledge-Based Systems Volume 134, 15 October 2017, Pages 189-198 10.1016/j.knosys.2017.07.032.
(5) R. J. Mozhdehi and H. Medeiros, "Deep convolutional particle filter for visual tracking," 2017 IEEE International Conference on Image Processing (ICIP), Beijing, 2017, pp. 3650-3654.
(6) Xiaoyan Qian, Lei Han, Yuedong Wang, and Meng Ding. 2018. Deep learning assisted robust visual tracking with adaptive particle filtering. Image Communication. 60, C (February 2018), 183-192.
(7) J. Xin, X. Du and J. Zhang, "Deep learning for robust outdoor vehicle visual tracking," 2017 IEEE International Conference on Multimedia and Expo (ICME), Hong Kong, 2017, pp. 613-618.
(8) Blanco-Filgueira, Beatriz & Garcia-Lesta, Daniel & Fernandez-Sanjurjo, Mauro & Manuel Brea, Victor & Lopez, Paula. (2019). Deep Learning-Based Multiple Object Visual Tracking on Embedded System for IoT and Mobile Edge Computing Applications. IEEE Internet of Things Journal. PP. 1-1. 10.1109/JIOT.2019.2902141.
(9) Schulter, Samuel et al. “Deep Network Flow for Multi-object Tracking.” 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017): 2730-2739.
(10) Huang, J., Zhou, W., Tian, Q. et al. “Exploiting weak mask representation with convolutional neural networks for accurate object tracking” Multimedia Tools Appl (2019). https://doi.org/10.1007/s11042-019-7219-y
(11) Bertinetto, Luca & Valmadre, Jack & Henriques, Joao & Vedaldi, Andrea & H. S. Torr, Philip. (2016). Fully-Convolutional Siamese Networks for Object Tracking. 9914. 850-865. 10.1007/978-3-319-48881-3_56.
(12) Wang, Bing & Wang, Li & Shuai, Bing & Zuo, Zhen & Liu, Ting & Luk Chan, Kap & Wang, Gang. (2016). Joint Learning of Convolutional Neural Networks and Temporally Constrained Metrics for Tracklet Association. 386-393. 10.1109/CVPRW.2016.55.
(13) X. Lu, H. Huo, T. Fang and H. Zhang, "Learning Deconvolutional Network for Object Tracking," in IEEE Access, vol. 6, pp. 18032-18041, 2018.
(14) Kim, Chanho & Li, Fuxin & Rehg, James. (2018). Multi-object Tracking with Neural Gating Using Bilinear LSTM: 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part VIII. 10.1007/978-3-030-01237-3_13.
(15) J. Son, M. Baek, M. Cho and B. Han, "Multi-object Tracking with Quadruplet Convolutional Neural Networks," 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, 2017, pp. 3786-3795.
(16) S. Tang, M. Andriluka, B. Andres and B. Schiele, "Multiple People Tracking by Lifted Multicut and Person Re-identification," 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, 2017, pp. 3701-3710.
(17) Milan, Anton & Rezatofighi, Hamid & Dick, Anthony & Schindler, Konrad & Reid, Ian. “Online Multi-target Tracking using Recurrent Neural Networks”. 2016 Association for the Advancement of Artiﬁcial Intelligence.
(18) Zhang, Peng et al. “Online object tracking based on CNN with spatial-temporal saliency guided sampling.” Neurocomputing 257 (2017): 115-127.
(19) K. Fang, Y. Xiang, X. Li and S. Savarese, "Recurrent Autoregressive Networks for Online Multi-object Tracking," 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Tahoe, NV, 2018, pp. 466-475.
(20) G. Wu, W. Lu, G. Gao, C. Zhao, J. Liua. "Regional deep learning model for visual tracking". 2016 Neurocomputing Volume 175, Part A, 29 January 2016, Pages 310-323.
(21) H. Kim and R. Park, "Residual LSTM Attention Network for Object Tracking," in IEEE Signal Processing Letters, vol. 25, no. 7, pp. 1029-1033, July 2018.
(22) H. Hu, B. Ma, J. Shen, H. Sun, L. Shao and F. Porikli, "Robust Object Tracking Using Manifold Regularized Convolutional Neural Networks," in IEEE Transactions on Multimedia, vol. 21, no. 2, pp. 510-521, Feb. 2019.
(23) Zhang, Peng & Zhuo, Tao & Huang, Hanqiao & Chen, Kangli & Zhang, Bo & Kankanhalli, Mohan. (2017). Robust tracking based on H-CNN with low-resource sampling and scaling by frame-wise motion localization. Multimedia Tools and Applications. 77. 10.1007/s11042-017-4493-4.
(24) K. Zhang, Q. Liu, Y. Wu and M. Yang, "Robust Visual Tracking via Convolutional Networks Without Training," in IEEE Transactions on Image Processing, vol. 25, no. 4, pp. 1779-1792, April 2016.
(25) N. Doulamis and A. Doulamis, "Semi-supervised deep learning for object tracking and classification," 2014 IEEE International Conference on Image Processing (ICIP), Paris, 2014, pp. 848-852.
(26) N. Wojke, A. Bewley and D. Paulus, "Simple online and real-time tracking with a deep association metric," 2017 IEEE International Conference on Image Processing (ICIP), Beijing, 2017, pp. 3645-3649.
(27) Zhang, Jianming & Jin, Xiaokang & Sun, Juan & Wang, Jin & Kumar, Arun. (2018). Spatial and semantic convolutional features for robust visual object tracking. Multimedia Tools and Applications. 10.1007/s11042-018-6562-8.
(28) Guan, Hao and Baozhong Cheng. “Taking full advantage of convolutional network for robust visual tracking.” Multimedia Tools and Applications (2018): 1-15.
(29) Sadeghian, Amir & Alahi, Alexandre & Savarese, Silvio. (2017). Tracking The Untrackable: Learning To Track Multiple Cues with Long-Term Dependencies.
(30) T. Xu and X. Wu, "Visual object tracking via deep neural network," 2015 IEEE First International Smart Cities Conference (ISC2), Guadalajara, 2015, pp. 1-6.
(31) Hua, Weixin et al. “Visual tracking based on stacked Denoising Autoencoder network with genetic algorithm optimization.” Multimedia Tools and Applications 77 (2017): 4253-4269.
(32) Li Zhang, Yuan Li and R. Nevatia, "Global data association for multi-object tracking using network flows," 2008 IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, AK, 2008, pp. 1-8.
(33) Simonyan, Karen & Zisserman, Andrew. (2014). Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 1409.1556.
(34) S. Tang, B. Andres, M. Andriluka and B. Schiele, "Subgraph decomposition for multi-target tracking," 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, 2015, pp. 5033-5041.
(35) Tang, Siyu & Andres, Bjoern & Andriluka, Mykhaylo & Schiele, Bernt. (2016). Multi-person Tracking by Multicut and Deep Matching. 9914. 100-111. 10.1007/978-3-319-48881-3_8.
(36) Kim, Chanho & Li, Fuxin & Ciptadi, Arridhana & Rehg, James. (2015). Multiple Hypothesis Tracking Revisited. 4696-4704. 10.1109/ICCV.2015.533.
(37) Feng, Xiaoyu & Mei, Wei & Hu, Dashuai. (2016). A Review of Visual Tracking with Deep Learning. 10.2991/aiie-16.2016.54.
(38) Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2016) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. arXiv:1606.00915
(39) Wu Y, Lim J, Yang MH (2015) Object tracking benchmark. TPAMI 37(9):1834–1848
(40) Kristan M, Leonardis A, Matas J, Felsberg M, Pflugfelder R, Cehovin L, Vojir T, et al (2016) The visual object tracking VOT2016 challenge results, pp 777–823
(41) Liang P, Blasch E, Ling H (2015) Encoding color information for visual tracking: algorithms and benchmark. TIP 24(12):5630–5644
(42) Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2016) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. arXiv:1606.00915
(43) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (12 1997)
(44) V. Mnih, N. Heess, A. Graves, and K. Kavukcuoglu, “Recurrent models of visual attention,” in Proc. Int. Conf. IEEE Neural Inf. Process. Syst., Montreal, QC, Canada, Dec. 2014, pp. 2204–2212.
(45) R. Tao, E. Gravves, and A. W. M. Smeulders, “Siamese instance search for tracking,” in Proc. Int. Conf. IEEE Comput. Vis. Pattern Recog., Las Vegas, NV, USA, Jun. 2016, pp. 1420–1429.
(46) C. Ma, X. Yang, C. Zhang, and M.-H. Yang, “Long-term correlation tracking,” in Proc. Int. Conf. IEEE Comput. Vis. Pattern Recog., Boston, MA, USA, Jun. 2015, pp. 5388–5396.
(47) M.I.J. Zoubin Ghahramani, Factorial hidden Markov models, Mach . learn. 29 (2-3) (1997) 245-273.
(48) Vincent P, Larochelle H, Lajoie I et al (2010) Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J Mach Learn Res 11:3371–3408
(49) Wang N, Yeung DY, “Learning a deep compact image representation for visual tracking,” Advances in Neural Information Processing Systems, pp. 809-817, Jan. 2013.
(50) Torralba A, et al, “80 million tiny images: A large data set for nonparametric object and scene recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 30, no. 11, pp. 1958-1970, Nov. 2008.