arXiv (Cornell University)
Jailbreaking Large Language Models with Symbolic Mathematics
September 2024 • Emet Bethany, Mazal Bethany, Juan A. Nolazco‐Flores, Sumit Kumar Jha, Peyman Najafirad
Recent advancements in AI safety have led to increased efforts in training and red-teaming large language models (LLMs) to mitigate unsafe content generation. However, these safety mechanisms may not be comprehensive, leaving potential vulnerabilities unexplored. This paper introduces MathPrompt, a novel jailbreaking technique that exploits LLMs' advanced capabilities in symbolic mathematics to bypass their safety mechanisms. By encoding harmful natural language prompts into mathematical problems, we demonstrate a…