AITtrack: Attention-Based Image-Text Alignment for Visual Tracking

Exploring foci of: IEEE Access • Vol 13 AITtrack: Attention-Based Image-Text Alignment for Visual Tracking January 2025 • Basit Alawode, Sajid Javed Vision-Language Models (VLMs) have recently advanced the Visual Object Tracking (VOT) performance. In VLMs, a vision encoder is employed to obtain visual representation, and a text encoder is employed to estimate the textual embeddings using natural language descriptions. By aligning the visual and textual representations, the VLMs achieve robust performance in complex and diverse tracking scenarios, efficiently handling dynamic target appearances such as motion blur, occlusion, fast motion, and similar object dis… Open Article Page

Computer Science Computer Vision Artificial Intelligence Eye Tracking Pedagogy Open Article