SOTVerse is a user-defined task space of single object tracking. It allows users to customize SOT tasks according to their research purposes, which on the one hand makes research more targeted, and on the other hand can significantly improve the efficiency of research.
The 3E paradigm aims to describe computer vision tasks by environment, evaluation, and executor: we synthesize the environment and evaluation to form SOTVerse -- a user-defined single object tracking task space, and conduct experiments in this space to judge executors' tracking ability. Definitely, this paradigm can be expanded to comprehensively describe other visual tasks and help users improve their research efficiency.
We organize existing benchmarks to form the environment of SOTVerse, which includes 12.56 million frames and frame-level challenging attribute labels to model the real world. Besides, an environment generation method is available to efficiently help researchers form their own task space.
We first point out the limitations of existing systems and indicators through detailed analysis; then design a new evaluation scheme for SOTVerse, which includes two mechanisms and new metrics to satisfy various tasks.
We conduct extensive experiments in the SOTVerse and perform performance analysis on various executors. Experimental results indicate the shortcoming of existing work and verify the effectiveness of the evaluation scheme in SOTVerse.
Please cite our paper if SOTVerse helps your research.
Please contact us if you have any problems or suggestions.