SOTVerse & VLTVerse

A User-defined Task Space of Single Object Tracking and A Fine-grained Evaluation in Vision-Language Tracking

SOTVerse is a user-defined task space of single object tracking. It allows users to customize SOT tasks according to their research purposes, which on the one hand makes research more targeted, and on the other hand can significantly improve the efficiency of research. VLTVerse is the first fine-grained evaluation framework for VLT trackers that comprehensively considers multiple challenge factors and diverse semantic information, hoping to reveal the role of language in VLT.

goal

Main Contributions

A 3E paradigm to describe computer vision tasks.

The 3E paradigm aims to describe computer vision tasks by environment, evaluation, and executor: we synthesize the environment and evaluation to form SOTVerse -- a user-defined single object tracking task space, and conduct experiments in this space to judge executors' tracking ability. Definitely, this paradigm can be expanded to comprehensively describe other visual tasks and help users improve their research efficiency.

A comprehensive and user-defined environment.

We organize existing benchmarks to form the environment of SOTVerse, which includes 12.56 million frames and frame-level challenging attribute labels to model the real world. Besides, an environment generation method is available to efficiently help researchers form their own task space.

A thoroughgoing evaluation scheme.

We first point out the limitations of existing systems and indicators through detailed analysis; then design a new evaluation scheme for SOTVerse, which includes two mechanisms and new metrics to satisfy various tasks.

A Fine-grained Evaluation

VLTVerse introduces 10 sequence-level challenge labels and 6 types of multi-granularity semantic information, creating a flexible and multi-dimensional evaluation space for VLT.

Various experimental executors and detailed analysis.

We conduct extensive experiments in the SOTVerse and VLTVerse and perform performance analysis on various executors. Experimental results indicate the shortcoming of existing work and verify the effectiveness of the evaluation scheme in SOTVerse and VLTVerse.

Latest News

Publications

Publication

SOTVerse: A User-defined Task Space of Single Object Tracking.
S. Hu, X. Zhao* and K. Huang* (*corresponding author)
International Journal of Computer Vision (IJCV)
[PDF] [BibTex]

Please cite our paper if SOTVerse helps your research.

Publication

How Texts Help? A Fine-grained Evaluation to Reveal the Role of Language in Vision-Language Tracking.
X. Li*, S. Hu*, X. Feng, D. Zhang, M. Wu, J. Zhang, X. Zhao and K. Huang (*Equal Contributions)
ArXiv Preprint
[PDF] [BibTex]

Organizers

Maintainer

Contact

Please contact us if you have any problems or suggestions.