A Deep Hyper Siamese Network for Real-Time Object Tracking

  • Yongpeng Zhao zhaoyongpeng@csu.edu.cn
  • lasheng yu
  • Xiaopeng Zheng
Keywords: Deep Learning, Siamese Network, Visual Object Tracking, SiamFC, ResNet

Abstract

Siamese networks have drawn increasing interest in the field of visual object tracking due to their balance of precision and efficiency. However, Siamese trackers use relatively shallow backbone networks, such as AlexNet, and therefore do not take full advantage of the capabilities of modern deep convolutional neural networks (CNNs). Moreover, the feature representations of the target object in a Siamese tracker are extracted through the last layer of CNNs and mainly capture semantic information, which causes the tracker's precision to be relatively low and to drift easily in the presence of similar distractors. In this paper, a new nonpadding residual unit (NPRU) is designed and used to stack a 22-layer deep ResNet, referred as ResNet22. After utilizing ResNet22 as the backbone network, we can build a deep Siamese network, which can greatly enhance the tracking performance. Considering that the different levels of the feature maps of the CNN represent different aspects of the target object, we aggregated different deep convolutional layers to make use of ResNet22's multilevel feature maps, which can form hyperfeature representations of targets. The designed deep hyper Siamese network is named DHSiam. Experimental results show that DHSiam has achieved significant improvement on multiple benchmark datasets.

References

(1) Zhang Z, Peng H. Deeper and wider siamese networks for real-time visual tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019: 4591-4600.

(2) Tao R, Gavves E, Smeulders A W M. Siamese instance search for tracking[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 1420-1429.

(3) Bertinetto L, Valmadre J, Henriques J F, et al. Fully-convolutional siamese networks for object tracking[C]//European conference on computer vision. Springer, Cham, 2016: 850-865.

(4) Wang Q, Teng Z, Xing J, et al. Learning attentions: residual attentional siamese network for high performance online visual tracking[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 4854-4863.

(5) Held D, Thrun S, Savarese S. Learning to track at 100 fps with deep regression networks[C]//European Conference on Computer Vision. Springer, Cham, 2016: 749-765.

(6) Valmadre J, Bertinetto L, Henriques J, et al. End-to-end representation learning for correlation filter based tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017: 2805-2813.

(7) Guo Q, Feng W, Zhou C, et al. Learning dynamic siamese network for visual object tracking[C]//Proceedings of the IEEE International Conference on Computer Vision. 2017: 1763-1771.

(8) Fan H, Ling H. Siamese cascaded region proposal networks for real-time visual tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019: 7952-7961.

(9) Zhu Z, Wang Q, Li B, et al. Distractor-aware siamese networks for visual object tracking[C]//Proceedings of the European Conference on Computer Vision (ECCV). 2018: 101-117.

(10) Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks[C]//Advances in neural information processing systems. 2012: 1097-1105.

(11) He K, Zhang X, Ren S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 770-778.

(12) Kong T, Yao A, Chen Y, et al. Hypernet: Towards accurate region proposal generation and joint object detection[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 845-853.

(13) Kim K H, Hong S, Roh B, et al. Pvanet: Deep but lightweight neural networks for real-time object detection[J]. arXiv preprint arXiv:1608.08021, 2016.

(14) Wang L, Ouyang W, Wang X, et al. Visual tracking with fully convolutional networks[C]//Proceedings of the IEEE international conference on computer vision. 2015: 3119-3127.

(15) Wang N, Li S, Gupta A, et al. Transferring rich feature hierarchies for robust visual tracking[J]. arXiv preprint arXiv:1501.04587, 2015.

(16) Danelljan M, Bhat G, Shahbaz Khan F, et al. Eco: Efficient convolution operators for tracking[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 6638-6646.

(17) Nam H, Han B. Learning multi-domain convolutional neural networks for visual tracking[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 4293-4302.

(18) Bertinetto L, Valmadre J, Golodetz S, et al. Staple: Complementary learners for real-time tracking[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 1401-1409.

(19) Fan H, Ling H. Parallel tracking and verifying: A framework for real-time and high accuracy visual tracking[C]//Proceedings of the IEEE International Conference on Computer Vision. 2017: 5486-5494.

Published
2020-04-30
How to Cite
Zhao, Y., yu, lasheng, & Zheng, X. (2020). A Deep Hyper Siamese Network for Real-Time Object Tracking. Transactions on Machine Learning and Artificial Intelligence, 8(1), 35-46. https://doi.org/10.14738/tmlai.81.8020