CAVGAN: Unifying Jailbreak and Defense of LLMs via Generative Adversarial Attacks on their Internal Representations

Exploring foci of: arXiv (Cornell University) CAVGAN: Unifying Jailbreak and Defense of LLMs via Generative Adversarial Attacks on their Internal Representations July 2025 • Xiaohu Li, Yunfeng Ning, Zepeng Bao, Mayi Xu, Jianhao Chen, Tieyun Qian Security alignment enables the Large Language Model (LLM) to gain the protection against malicious queries, but various jailbreak attack methods reveal the vulnerability of this security mechanism. Previous studies have isolated LLM jailbreak attacks and defenses. We analyze the security protection mechanism of the LLM, and propose a framework that combines attack and defense. Our method is based on the linearly separable property of LLM intermediate layer embedding, as well as the essence of jailbreak attack, whi… Open Article Page

United States Secretary Of Defense Terminal High Altitude Area Defense United States Department Of Defense Generative Artificial Intelligence Defense Intelligence Agency Defense Threat Reduction Agency Generative Pre-Trained Transformer List Of Equipment Of The Israel Defense Forces Japan Ground Self-Defense Force Open Article

Rafael Advanced Defense Systems Generative Adversarial Network Twinkie Defense National Defense Service Medal Boeing Defense, Space & Security List Of Equipment Of The Japan Ground Self-Defense Force Jewish Defense League Aegis Ballistic Missile Defense System Defense Distinguished Service Medal Open Article

Defense Superior Service Medal Ground-Based Midcourse Defense Defense Meteorological Satellite Program Missile Defense Defense Of The Ancients Man-Portable Air-Defense System Open Article