Instruction-Guided Scene Text Recognition

Exploring foci of: arXiv (Cornell University) Instruction-Guided Scene Text Recognition January 2024 • Yongkun Du, Zhineng Chen, Yuchen Su, Caiyan Jia, Yu–Gang Jiang Multi-modal models have shown appealing performance in visual recognition tasks, as free-form text-guided training evokes the ability to understand fine-grained visual content. However, current models cannot be trivially applied to scene text recognition (STR) due to the compositional difference between natural and text images. We propose a novel instruction-guided scene text recognition (IGTR) paradigm that formulates STR as an instruction learning problem and understands text images by predicting character attri… Open Article Page

Computer Science Artificial Intelligence Computer Vision Open Article