Exploring foci of:
arXiv (Cornell University)
Waffling around for Performance: Visual Classification with Random Words and Broad Concepts
June 2023 • Karsten Roth, Jae Myung Kim, A. Sophia Koepke, Oriol Vinyals, Cordelia Schmid, Zeynep Akata
The visual classification performance of vision-language models such as CLIP has been shown to benefit from additional semantic knowledge from large language models (LLMs) such as GPT-3. In particular, averaging over LLM-generated class descriptors, e.g. "waffle, which has a round shape", can notably improve generalization performance. In this work, we critically study this behavior and propose WaffleCLIP, a framework for zero-shot visual classification which simply replaces LLM-generated descriptors with random c…
Computer Science
Generalization
Artificial Intelligence
Machine Learning
Programming Language
Paleontology
Mathematical Analysis
Biology
Philosophy
Mathematics