Exploring foci of:
arXiv (Cornell University)
Jailbreaking Large Language Models with Symbolic Mathematics
September 2024 • Emet Bethany, Mazal Bethany, Juan A. Nolazco‐Flores, Sumit Kumar Jha, Peyman Najafirad
Recent advancements in AI safety have led to increased efforts in training and red-teaming large language models (LLMs) to mitigate unsafe content generation. However, these safety mechanisms may not be comprehensive, leaving potential vulnerabilities unexplored. This paper introduces MathPrompt, a novel jailbreaking technique that exploits LLMs' advanced capabilities in symbolic mathematics to bypass their safety mechanisms. By encoding harmful natural language prompts into mathematical problems, we demonstrate a…
Computer Science
Programming Language
Mathematics
Philosophy