Exploring foci of:
arXiv (Cornell University)
CAVGAN: Unifying Jailbreak and Defense of LLMs via Generative Adversarial Attacks on their Internal Representations
July 2025 • Xiaohu Li, Yunfeng Ning, Zepeng Bao, Mayi Xu, Jianhao Chen, Tieyun Qian
Security alignment enables the Large Language Model (LLM) to gain the protection against malicious queries, but various jailbreak attack methods reveal the vulnerability of this security mechanism. Previous studies have isolated LLM jailbreak attacks and defenses. We analyze the security protection mechanism of the LLM, and propose a framework that combines attack and defense. Our method is based on the linearly separable property of LLM intermediate layer embedding, as well as the essence of jailbreak attack, whi…
United States Secretary Of Defense
Terminal High Altitude Area Defense
United States Department Of Defense
Generative Artificial Intelligence
Defense Intelligence Agency
Defense Threat Reduction Agency
Generative Pre-Trained Transformer
List Of Equipment Of The Israel Defense Forces
Japan Ground Self-Defense Force
Rafael Advanced Defense Systems
Generative Adversarial Network
Twinkie Defense
National Defense Service Medal
Boeing Defense, Space & Security
List Of Equipment Of The Japan Ground Self-Defense Force
Jewish Defense League
Aegis Ballistic Missile Defense System
Defense Distinguished Service Medal
Defense Superior Service Medal
Ground-Based Midcourse Defense
Defense Meteorological Satellite Program
Missile Defense
Defense Of The Ancients
Man-Portable Air-Defense System