Nils Hoehing
YOU?
Author Swipe
View article: What's left can't be right -- The remaining positional incompetence of contrastive vision-language models
What's left can't be right -- The remaining positional incompetence of contrastive vision-language models Open
Contrastive vision-language models like CLIP have been found to lack spatial understanding capabilities. In this paper we discuss the possible causes of this phenomenon by analysing both datasets and embedding space. By focusing on simple …