Language Models as Zero-shot Visual Semantic Learners

Exploring foci of: arXiv (Cornell University) Language Models as Zero-shot Visual Semantic Learners July 2021 • Yue Jiao, Jonathon Hare, Adam Prügel‐Bennett Visual Semantic Embedding (VSE) models, which map images into a rich semantic embedding space, have been a milestone in object recognition and zero-shot learning. Current approaches to VSE heavily rely on static word em-bedding techniques. In this work, we propose a Visual Se-mantic Embedding Probe (VSEP) designed to probe the semantic information of contextualized word embeddings in visual semantic understanding tasks. We show that the knowledge encoded in transformer language models can be exploited for tasks re… Open Article Page

Computer Science Artificial Intelligence Cognition Perception Philosophy Neuroscience Open Article