SOTVerse

SOTVerse is a user-defined task space of single object tracking. It allows users to customize SOT tasks according to their research purposes, which on the one hand makes research more targeted, and on the other hand can significantly improve the efficiency of research. VLTVerse is the first fine-grained evaluation framework for VLT trackers that comprehensively considers multiple challenge factors and diverse semantic information, hoping to reveal the role of language in VLT.

goal

Main Contributions

A 3E paradigm to describe computer vision tasks.

The 3E paradigm aims to describe computer vision tasks by environment, evaluation, and executor: we synthesize the environment and evaluation to form SOTVerse -- a user-defined single object tracking task space, and conduct experiments in this space to judge executors' tracking ability. Definitely, this paradigm can be expanded to comprehensively describe other visual tasks and help users improve their research efficiency.

A comprehensive and user-defined environment.

We organize existing benchmarks to form the environment of SOTVerse, which includes 12.56 million frames and frame-level challenging attribute labels to model the real world. Besides, an environment generation method is available to efficiently help researchers form their own task space.

A thoroughgoing evaluation scheme.

We first point out the limitations of existing systems and indicators through detailed analysis; then design a new evaluation scheme for SOTVerse, which includes two mechanisms and new metrics to satisfy various tasks.

A Fine-grained Evaluation

VLTVerse introduces 10 sequence-level challenge labels and 6 types of multi-granularity semantic information, creating a flexible and multi-dimensional evaluation space for VLT.

Various experimental executors and detailed analysis.

We conduct extensive experiments in the SOTVerse and VLTVerse and perform performance analysis on various executors. Experimental results indicate the shortcoming of existing work and verify the effectiveness of the evaluation scheme in SOTVerse and VLTVerse.

Latest News

[2024.11.24] Recently, we have proposed a fine-grained evaluation to reveal the role of language in vision-language tracking named VLTVerse for robust VLT research. The VLTVerse, toolkit, and results will be available soon.
[2023.12.20] Recently, we have proposed a bionic drone-based single object tracking benchmark named BioDrone for robust vision research. Now you can download the dataset from the download page via the URL. The BioDrone paper has been accepted by International Journal of Computer Vision (IJCV)!
[2023.09.22] Recently, we have proposed a new multi-modal global instance tracking benchmark named MGIT. Now you can download the dataset from the download page via the URLs of VideoCube-Tiny. The MGIT paper has been accepted by the 37th Conference on Neural Information Processing Systems (NeurIPS 2023) Track on Datasets and Benchmark!
[2023.09.12] The SOTVerse paper has been accepted by International Journal of Computer Vision (IJCV)! The paper, experimental results, and toolkit will be updated gradually.
[2022.04.18] The related paper has been released on arXiv. Please cite our paper if SOTVerse helps your research.
[2022.03.25] The home page and the instructions page have been released! More information will be available soon.

Publications

SOTVerse: A User-defined Task Space of Single Object Tracking.
S. Hu, X. Zhao* and K. Huang* (*corresponding author)
International Journal of Computer Vision (IJCV)
[PDF] [BibTex]

Please cite our paper if SOTVerse helps your research.

How Texts Help? A Fine-grained Evaluation to Reveal the Role of Language in Vision-Language Tracking.
X. Li*, S. Hu*, X. Feng, D. Zhang, M. Wu, J. Zhang, X. Zhao and K. Huang (*Equal Contributions)
ArXiv Preprint
[PDF] [BibTex]

Organizers

Shiyu Hu, Center for Research on Intelligent System and Engineering (CRISE), CASIA.
Xuchen Li, Center for Research on Intelligent System and Engineering (CRISE), CASIA.
Xin Zhao, Center for Research on Intelligent System and Engineering (CRISE), CASIA.
Kaiqi Huang, Center for Research on Intelligent System and Engineering (CRISE), CASIA.

Maintainer

Xuchen Li, Center for Research on Intelligent System and Engineering (CRISE), CASIA.

Contact

Please contact us if you have any problems or suggestions.

SOTVerse & VLTVerse

A User-defined Task Space of Single Object Tracking and A Fine-grained Evaluation in Vision-Language Tracking