Exploring foci of:
arXiv (Cornell University)
Instruction-Guided Scene Text Recognition
January 2024 • Yongkun Du, Zhineng Chen, Yuchen Su, Caiyan Jia, Yu–Gang Jiang
Multi-modal models have shown appealing performance in visual recognition tasks, as free-form text-guided training evokes the ability to understand fine-grained visual content. However, current models cannot be trivially applied to scene text recognition (STR) due to the compositional difference between natural and text images. We propose a novel instruction-guided scene text recognition (IGTR) paradigm that formulates STR as an instruction learning problem and understands text images by predicting character attri…
Computer Science
Artificial Intelligence
Computer Vision