SDSTrack: Self-Distillation Symmetric Adapter Learning for Multi-Modal Visual Object Tracking

Hou, Xiaojun; Xing, Jiazheng; Qian, Yijie; Guo, Yaowei; Xin, Shuo; Chen, Junhao; Tang, Kai; Wang, Mengmeng; Jiang, Zhengkai; Liu, Liang; Liu, Yong

Full-text links:

Download:

Current browse context:

cs.CV

< prev | next >

new | recent | 2403

Change to browse by:

Computer Science > Computer Vision and Pattern Recognition

Title: SDSTrack: Self-Distillation Symmetric Adapter Learning for Multi-Modal Visual Object Tracking

Authors: Xiaojun Hou, Jiazheng Xing, Yijie Qian, Yaowei Guo, Shuo Xin, Junhao Chen, Kai Tang, Mengmeng Wang, Zhengkai Jiang, Liang Liu, Yong Liu

(Submitted on 24 Mar 2024 (v1), last revised 28 Mar 2024 (this version, v2))

Abstract: Multimodal Visual Object Tracking (VOT) has recently gained significant attention due to its robustness. Early research focused on fully fine-tuning RGB-based trackers, which was inefficient and lacked generalized representation due to the scarcity of multimodal data. Therefore, recent studies have utilized prompt tuning to transfer pre-trained RGB-based trackers to multimodal data. However, the modality gap limits pre-trained knowledge recall, and the dominance of the RGB modality persists, preventing the full utilization of information from other modalities. To address these issues, we propose a novel symmetric multimodal tracking framework called SDSTrack. We introduce lightweight adaptation for efficient fine-tuning, which directly transfers the feature extraction ability from RGB to other domains with a small number of trainable parameters and integrates multimodal features in a balanced, symmetric manner. Furthermore, we design a complementary masked patch distillation strategy to enhance the robustness of trackers in complex environments, such as extreme weather, poor imaging, and sensor failure. Extensive experiments demonstrate that SDSTrack outperforms state-of-the-art methods in various multimodal tracking scenarios, including RGB+Depth, RGB+Thermal, and RGB+Event tracking, and exhibits impressive results in extreme conditions. Our source code is available at this https URL

Comments:	Accepted by CVPR2024
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2403.16002 [cs.CV]
	(or arXiv:2403.16002v2 [cs.CV] for this version)

Submission history

From: Xiaojun Hou [view email]
[v1] Sun, 24 Mar 2024 04:15:50 GMT (47029kb,D)
[v2] Thu, 28 Mar 2024 03:22:52 GMT (16772kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2403.16002

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computer Vision and Pattern Recognition

Title: SDSTrack: Self-Distillation Symmetric Adapter Learning for Multi-Modal Visual Object Tracking

Submission history