Sample-Balanced and IoU-Guided Anchor-Free Visual Tracking

Authors

  • Jueyu Zhu School of computer Science, Hunan First Normal University, Changsha 410205, Hunan, China
  • Yu Qin School of Computer and Communication Engineering, Changsha University of Science and Technology, Changsha 410114, Hunan, China.
  • Kai Wang School of Computer and Communication Engineering, Changsha University of Science and Technology, Changsha 410114, Hunan, China.
  • Zhigao Zeng School of Computer and Communication Engineering, Changsha University of Science and Technology, Changsha 410114, Hunan, China.

DOI:

https://doi.org/10.5566/ias.2929

Keywords:

Machine vision, Target tracking, Siamese neural network, Cross-entropy, Intersection over union

Abstract

Siamese network-based visual tracking algorithms have achieved excellent performance in recent years, but challenges such as fast target motion, shape and scale variations have made the tracking extremely difficult. The regression of anchor-free tracking has low computational complexity, strong real-time performance, and is suitable for visual tracking. Based on the anchor-free siamese tracking framework, this paper firstly introduces balance factors and modulation coefficients into the cross-entropy loss function to solve the classification inaccuracy caused by the imbalance between positive and negative samples as well as the imbalance between hard and easy samples during the training process, so that the model focuses more on the positive samples and the hard samples that make the major contribution to the training. Secondly, the intersection over union (IoU) loss function of the regression branch is improved, not only focusing on the IoU between the predicted box and the ground truth box, but also considering the aspect ratios of the two boxes and the minimum bounding box area that accommodate the two, which guides the generation of more accurate regression offsets. The overall loss of classification and regression is iteratively minimized and improves the accuracy and robustness of visual tracking. Experiments on four public datasets, OTB2015, VOT2016, UAV123 and GOT-10k, show that the proposed algorithm achieves the state-of-the-art performance.

References

bibitem[Bertinetto etal(2016)]{3}

Bertinetto L, Valmadre J, Henriques J F, Joao F, Vedaldi A, Torr P HS (2016). Fully-convolutional siamese networks for object tracking. In: Proceedings of the European Conference on Computer Vision, Cham: Springer, 850--865.

bibitem[Cui etal(2020)]{1}

Cui Z J, An J S, Zhang Y F, Cui T S (2020). Light-weight siamese attention network object tracking for unmanned arial vehicle. Acta Optica Sinica 40: 1915001.

bibitem[Cheng etal(2021)]{10}

Cheng S, Zhong B, Li G, Liu X, Tang Z, Li X, Wang J (2021). Learning to filter: siamese relation network for robust tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New York: IEEE Press, 4421--4431.

bibitem[Chen etal(2020)]{13}

Chen Z, Zhong B, Li G, Zhang S, Ji R (2020). Siamese box adaptive network for visual tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, New York: IEEE Press, 6668--6677.

bibitem[Danelljan etal(2017)]{24}

Danelljan M, Bhat G, Shahbaz K F, Felsberg M (2017). Eco: efficient convolution operators for tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA. New York: IEEE Press, 6638--6646.

bibitem[Danelljan etal(2015)]{30}

Danelljan M, Hager G, Shahbaz K F, Felsberg M (2015). Learning spatially regularized correlation filters for visual tracking. In: Proceedings of the IEEE International Conference on Computer Vision. Cham: Springer, 4310--4318.

bibitem[Guo etal(2020)]{12}

Guo D, Wang J, Cui Y, Wang Z, Chen S (2020). SiamCAR: siamese fully convolutional classification and regression for visual tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, New York: IEEE Press, 6269--6277.

bibitem[Giannakas etal(2021)]{16}

Giannakas F, Troussas C, Voyiatzis I, Sgouropoulou C (2021). A deep learning classification framework for early prediction of team-based academic performance. Applied Soft Computing 106: 107355.

bibitem[Hadfield etal(2016)]{26}

Hadfield S J, Bowden R, Lebeda K (2016). The visual object tracking VOT2016 challenge results. Lecture Notes in Computer Science 9914: 777--823.

bibitem[He etal(2016)]{8}

He K, Zhang X, Ren S, Sun J (2016). Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA. Cham: Springer, 770--778.

bibitem[Huang etal(2021)]{31}

Huang L, Zhao X, Huang K (2021). A large high-diversity benchmark for generic object tracking in the wild. IEEE Transactions on Pattern Analysis and Machine Intelligence 43: 1562--1577.

bibitem[J Zhang etal(2022a)]{32}

J Zhang, W Feng, T Yuan, Wang J, Sangaiah A K (2022a). SCSTCF: Spatial-Channel Selection and Temporal Regularized Correlation Filters for Visual Tracking. Applied Soft Computing 118: 108485.

bibitem[J Zhang etal(2022b)]{33}

J Zhang, J Sun, J Wang, Li Z, Chen X (2022b). An object tracking framework with recapture based on correlation filters and Siamese networks. Computers and Electrical Engineering 98: 107730.

bibitem[J Zhang etal(2021)]{34}

J Zhang, J Sun, J Wang, Yue X-G (2021). Visual object tracking based on residual network and cascaded correlation filters. Journal of Ambient Intelligence and Humanized Computing 12: 8427--8440.

bibitem[J Zhang etal(2023)]{35}

J Zhang, Y He, S Wang (2023). Learning adaptive sparse spatially-regularized correlation filters for visual tracking. IEEE Signal Processing Letters 30: 11--15.

bibitem[Krizhevsky etal(2017)]{9}

Krizhevsky A, Sutskever I, Hinton G E (2017). Imagenet classification with deep convolutional neural networks. Communications of the ACM 60: 84--90.

bibitem[Kiani etal(2017)]{25}

Kiani Galoogahi H, Fagg A, Lucey S (2017). Learning background-aware correlation filters for visual tracking. In: Proceedings of the IEEE International Conference on Computer Vision. New York: IEEE Press, 1135--1143.

bibitem[Li etal(2021)]{2}

Li C, Yang D D, Song P, Guo C, Guo C (2021). Global-aware siamese network for thermal infrared object tracking. Acta Optica Sinica 41: 0615002.

bibitem[Li etal(2018)]{4}

Li B, Yan J, Wu W, Zhu Z, Hu X (2018). High performance visual tracking with siamese region proposal network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA. New York: IEEE Press, 8971--8980.

bibitem[Li etal(2019)]{7}

Li B, Wu W, Wang Q, Zhang F, Xing J, Yan J (2019). SiamRPN++: evolution of siamese visual tracking with very deep networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, USA. New York: IEEE Press, 4282--4291.

bibitem[Law and Deng(2018)]{15}

Law H, Deng J (2018). Cornernet: detecting objects as paired keypoints. In: Proceedings of the European Conference on Computer Vision, Cham: Springer, 734--750.

bibitem[Lin etal(2020)]{19}

Lin T Y, Goyal P, Girshick R, He K, Doll{'a}r P (2020). Focal loss for dense object detection. IEEE Trans Pattern Anal Mach Intell 42: 318--327.

bibitem[Lin etal(2014)]{22}

Lin T Y, Michael M, Belongie S, Hays J, Perona P, Ramanan D, Doll{'a}r P, Zitnick C L (2014). Microsoft coco: common objects in context. In Proceedings of the European Conference on Computer Vision, Cham: Springer, 740--755.

bibitem[Mueller etal(2016)]{29}

Mueller M, Smith N, Ghanem B (2016). A benchmark and simulator for uav tracking. In: Proceedings of the European Conference on Computer Vision, Cham: Springer, 445--461.

bibitem[Oprea etal(2020)]{17}

Oprea S, Martinez-Gonzalez P, Garcia-Garcia A, Castro-Vargas J A, Orts-Escolano S, Garcia-Rodriguez J, Argyros A (2020). A review on deep learning techniques for video prediction. IEEE Transactions on Pattern Analysis and Machine Intelligence 44: 2806--2826.

bibitem[Ren etal(2017)]{5}

Ren S, He K, Girshick R, Sun J (2017). Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39: 1137--1149.

bibitem[Rezatofighi etal(2019)]{18}

Rezatofighi H, Tsoi N, Gwak J Y, Sadeghian A, Reid I, Savarese S. Generalized intersection over union: a metric and a loss for bounding box regression. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, New York: IEEE Press, 658--666.

bibitem[Russakovsky etal(2015)]{20}

Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M (2015). Imagenet large scale visual recognition challenge. International Journal of Computer Vision 115: 211--252.

bibitem[Real etal(2015)]{21}

Real E, Shlens J, Mazzocchi S, Pan X, Vanhoucke V (2015). Youtube-boundingboxes: a large high-precision human-annotated data set for object detection in video. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA. New York: IEEE Press, 5296--5305.

bibitem[Wu etal(2015)]{23}

Wu Y, Lim J, Yang M H (2015). Object tracking benchmark. IEEE Transactions on Pattern Analysis and Machine Intelligence 37: 1834--1848.

bibitem[Wang etal(2019)]{28}

Wang G, Luo C, Xiong Z, Zeng W (2019). Spm-tracker: series-parallel matching for real-time visual object tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, USA. New York: IEEE Press, 3643--3652.

bibitem[Xu etal(2020)]{14}

Xu Y, Wang Z, Li Z, Yuan Y, Yu G (2020). SiamFC++: towards robust and accurate visual tracking with target estimation guidelines. In: Proceedings of the AAAI Conference on Artificial Intelligence, 34: 12549--12556.

bibitem[Yang etal(2020)]{27}

Yang T, Xu P, Hu R, Chai H, Chan A B (2020). ROAM: recurrently optimizing tracking model. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 6718--6727.

bibitem[Zhang and Peng (2019)]{6}

Zhang Z, Peng H (2019). Deeper and wider siamese networks for real-time visual tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, USA. New York: IEEE Press, 4591--4600.

bibitem[Zheng etal(2018)]{11}

Zheng Z, Wang Q, Li B, Wu W, Yan J, Hu W (2018). Distractor-aware siamese networks for visual object tracking. In: Proceedings of the European Conference on Computer Vision, Munich, Germany. Cham: Springer, 101--117.

Downloads

Published

2023-11-01

Issue

Section

Original Research Paper

How to Cite

Zhu, J., Qin, Y., Wang, K., & Zeng, Z. (2023). Sample-Balanced and IoU-Guided Anchor-Free Visual Tracking. Image Analysis and Stereology, 42(3), 161-170. https://doi.org/10.5566/ias.2929