arXiv (Cornell University)
PARC: A Quantitative Framework Uncovering the Symmetries within Vision Language Models
June 2025 • Jenny Schmalfuß, Nadine Chang, Vibashan VS, Maying Shen, Andrés Bruhn, Jose M. Álvarez
Vision language models (VLMs) respond to user-crafted text prompts and visual inputs, and are applied to numerous real-world problems. VLMs integrate visual modalities with large language models (LLMs), which are well known to be prompt-sensitive. Hence, it is crucial to determine whether VLMs inherit this instability to varying prompts. We therefore investigate which prompt variations VLMs are most sensitive to and which VLMs are most agnostic to prompt variations. To this end, we introduce PARC (Prompt Analysis …