Exploring foci of
2023-11-20
What's left can't be right -- The remaining positional incompetence of contrastive vision-language models
2023-11-20 • Nils Hoehing, Ellen Rushe, Anthony Ventresque
Contrastive vision-language models like CLIP have been found to lack spatial understanding capabilities. In this paper we discuss the possible causes of this phenomenon by analysing both datasets and embedding space. By focusing on simple left-right positional relations, we show that this behaviour is entirely predictable, even with large-scale datasets, demonstrate that these relations can be taught using synthetic data and show that this approach can generalise well to natural images - improving the performance …
Left Anterior Descending Artery
Left-Handed Wife
Can't Fight The Moonlight
Can't Tell Me Nothing (Mixtape)
Left Party (Sweden)
What's Eating Gilbert Grape
Can't Fight This Feeling
No Tears Left To Cry
Can't Say I Ain't Country