Jailbreaking Large Language Models with Symbolic Mathematics

Exploring foci of: arXiv (Cornell University) Jailbreaking Large Language Models with Symbolic Mathematics September 2024 • Emet Bethany, Mazal Bethany, Juan A. Nolazco‐Flores, Sumit Kumar Jha, Peyman Najafirad Recent advancements in AI safety have led to increased efforts in training and red-teaming large language models (LLMs) to mitigate unsafe content generation. However, these safety mechanisms may not be comprehensive, leaving potential vulnerabilities unexplored. This paper introduces MathPrompt, a novel jailbreaking technique that exploits LLMs' advanced capabilities in symbolic mathematics to bypass their safety mechanisms. By encoding harmful natural language prompts into mathematical problems, we demonstrate a… Open Article Page

Computer Science Programming Language Mathematics Philosophy Open Article