arXiv (Cornell University)
Securing Vision-Language Models with a Robust Encoder Against Jailbreak and Adversarial Attacks
September 2024 • Md Zarif Hossain, Ahmed Imteaj
Large Vision-Language Models (LVLMs), trained on multimodal big datasets, have significantly advanced AI by excelling in vision-language tasks. However, these models remain vulnerable to adversarial attacks, particularly jailbreak attacks, which bypass safety protocols and cause the model to generate misleading or harmful responses. This vulnerability stems from both the inherent susceptibilities of LLMs and the expanded attack surface introduced by the visual modality. We propose Sim-CLIP+, a novel defense mechan…