Santiago Zanella-Béguelin

Beyond Membership: Limitations of Add/Remove Adjacency in Differential Privacy Open

Gauri Pradhan, Santiago Zanella-Béguelin · 2025

Training machine learning models with differential privacy (DP) limits an adversary's ability to infer sensitive information about the training data. It can be interpreted as a bound on adversary's capability to distinguish two adjacent da…

The Price of Intelligence Open

Mark Russinovich, Ahmed Salem, Santiago Zanella-Béguelin, Yonatan Zunger · 2025

Three risks inherent in LLMs.

A Systematization of Security Vulnerabilities in Computer Use Agents Open

Daniel Jones, Giorgio Severi, Martin Pouliot, Gary Lopez, J. de Gruyter , et al. · 2025

Computer Use Agents (CUAs), autonomous systems that interact with software interfaces via browsers or virtual machines, are rapidly being deployed in consumer and enterprise environments. These agents introduce novel attack surfaces and tr…

A Representation Engineering Perspective on the Effectiveness of Multi-Turn Jailbreaks Open

Blake Bullwinkel, Mark Russinovich, Ahmed Salem, Santiago Zanella-Béguelin, Daniel B. Jones , et al. · 2025

Recent research has demonstrated that state-of-the-art LLMs and defenses remain susceptible to multi-turn jailbreak attacks. These attacks require only closed-box model access and are often easy to perform manually, posing a significant th…

Securing AI Agents with Information-Flow Control Open

Manuel F. M. Costa, Aashish Kolluri, Mark Russinovich, Ahmed Salem, Shruti Tople , et al. · 2025

As AI agents become increasingly autonomous and capable, ensuring their security against vulnerabilities such as prompt injection becomes critical. This paper explores the use of information-flow control (IFC) to provide security guarantee…

The Price of Intelligence Open

Mark Russinovich, Ahmed Salem, Santiago Zanella-Béguelin, Yonatan Zunger · 2024

The vulnerability of LLMs to hallucination, prompt injection, and jailbreaks poses a significant but surmountable challenge to their widespread adoption and responsible use. We have argued that these problems are inherent, certainly in the…

Permissive Information-Flow Analysis for Large Language Models Open

Shoaib Ahmed Siddiqui, R. Gaonkar, Boris Köpf, David Krueger, Andrew Paverd , et al. · 2024

Large Language Models (LLMs) are rapidly becoming commodity components of larger software systems. This poses natural security and privacy problems: poisoned data retrieved from one component can change the model's behavior and compromise …

Dataset and Lessons Learned from the 2024 SaTML LLM Capture-the-Flag Competition Open

Edoardo Debenedetti, Javier Rando, Daniel Paleka, Silaghi Fineas Florin, Dragos Albastroiu , et al. · 2024

Large language model systems face important security risks from maliciously crafted messages that aim to overwrite the system's original instructions or leak private data. To study this problem, we organized a capture-the-flag competition …

Closed-Form Bounds for DP-SGD against Record-level Inference Open

Giovanni Cherubin, Boris Köpf, Andrew Paverd, Shruti Tople, Lukas Wutschitz , et al. · 2024

Machine learning models trained with differentially-private (DP) algorithms such as DP-SGD enjoy resilience against a wide range of privacy attacks. Although it is possible to derive bounds for some attacks based solely on an $(\varepsilon…

Rethinking Privacy in Machine Learning Pipelines from an Information Flow Control Perspective Open

Lukas Wutschitz, Boris Köpf, Andrew Paverd, Saravan Rajmohan, Ahmed Salem , et al. · 2023

Modern machine learning systems use models trained on ever-growing corpora. Typically, metadata such as ownership, access control, or licensing information is ignored during training. Instead, to mitigate privacy risks, we rely on generic …

On the Efficacy of Differentially Private Few-shot Image Classification Open

Marlon Tobaben, Aliaksandra Shysheya, John Bronskill, Andrew Paverd, Shruti Tople , et al. · 2023

There has been significant recent progress in training differentially private (DP) models which achieve accuracy that approaches the best non-private models. These DP models are typically pretrained on large public datasets and then fine-t…

Analyzing Leakage of Personally Identifiable Information in Language Models Open

Nils Lukas, Ahmed Salem, Robert B. Sim, Shruti Tople, Lukas Wutschitz , et al. · 2023

Language Models (LMs) have been shown to leak information about training data through sentence-level membership inference and reconstruction attacks. Understanding the risk of LMs leaking Personally Identifiable Information (PII) has recei…

SoK: Let the Privacy Games Begin! A Unified Treatment of Data Inference Privacy in Machine Learning Open

Ahmed Salem, Giovanni Cherubin, David Evans, Boris Köpf, Andrew Paverd , et al. · 2022

Deploying machine learning models in production may allow adversaries to infer sensitive information about training data. There is a vast literature analyzing different types of inference risks, ranging from membership inference to reconst…

Bayesian Estimation of Differential Privacy Open

Santiago Zanella-Béguelin, Lukas Wutschitz, Shruti Tople, Ahmed Salem, Victor Rühle , et al. · 2022

Algorithms such as Differentially Private SGD enable training machine learning models with formal privacy guarantees. However, there is a discrepancy between the protection that such algorithms guarantee in theory and the protection they a…

Analyzing Information Leakage of Updates to Natural Language Models Open

Santiago Zanella-Béguelin, Lukas Wutschitz, Shruti Tople, Victor Rühle, Andrew Paverd , et al. · 2020

To continuously improve quality and reflect changes in data, machine learning\napplications have to regularly retrain and update their core models. We show\nthat a differential analysis of language model snapshots before and after an\nupda…

HACLxN: Verified Generic SIMD Crypto (for all your favourite platforms) Open

Marina Polubelova, Karthikeyan Bhargavan, Jonathan Protzenko, Benjamin Beurdouche, Aymeric Fromherz , et al. · 2020

We present a new methodology for building formally verified cryptographic libraries that are optimized for multiple architectures. In particular, we show how to write and verify generic crypto code in the F* programming language that explo…

EverCrypt: A Fast, Verified, Cross-Platform Cryptographic Provider Open

Jonathan Protzenko, Bryan Parno, Aymeric Fromherz, Chris Hawblitzel, Marina Polubelova , et al. · 2020

International audience

Analyzing Privacy Loss in Updates of Natural Language Models. Open

Shruti Tople, Marc Brockschmidt, Boris Köpf, Olga Ohrimenko, Santiago Zanella-Béguelin · 2019

To continuously improve quality and reflect changes in data, machine learning-based services have to regularly re-train and update their core models. In the setting of language models, we show that a comparative analysis of model snapshots…

Imperfect forward secrecy Open

David Adrian, Karthikeyan Bhargavan, Zakir Durumeric, Pierrick Gaudry, Matthew Green , et al. · 2018

We investigate the security of Diffie-Hellman key exchange as used in popular Internet protocols and find it to be less secure than widely believed. First, we present Logjam, a novel flaw in TLS that lets a man-in-the-middle downgrade conn…

Towards Automated Proving of Relational Properties of Probabilistic Programs (Invited Talk) Open

Klaus von Gleissenthall, Andrey Rybalchenko, Santiago Zanella-Béguelin · 2018

Some security properties go beyond what is expressible in terms of an individual execution of a single program. In particular, many security policies in cryptography can be naturally phrased as relational properties of two open probabilist…

A monadic framework for relational verification: applied to information security, program equivalence, and optimizations Open

Niklas Grimm, Kenji Maillard, Cédric Fournet, Cătălin Hriţcu, Matteo Maffei , et al. · 2018

International audience

Verified low-level programming embedded in F* Open

Jonathan Protzenko, Jean-Karim Zinzindohoué, Aseem Rastogi, Tahina Ramananandro, Peng Wang , et al. · 2017

We present Low*, a language for low-level programming and verification, and its application to high-assurance optimized cryptographic libraries. Low* is a shallow embedding of a small, sequential, well-behaved subset of C in F*, a dependen…

Implementing and Proving the TLS 1.3 Record Layer Open

Antoine Delignat-Lavaud, Cédric Fournet, Markulf Kohlweiss, Jonathan Protzenko, Aseem Rastogi , et al. · 2017

International audience

Verified Low-Level Programming Embedded in F* Open

Jonathan Protzenko, Jean-Karim Zinzindohoué, Aseem Rastogi, Tahina Ramananandro, Peng Wang , et al. · 2017

We present Low*, a language for low-level programming and verification, and its application to high-assurance optimized cryptographic libraries. Low* is a shallow embedding of a small, sequential, well-behaved subset of C in F*, a dependen…

A Monadic Framework for Relational Verification (Functional Pearl). Open

Niklas Grimm, Kenji Maillard, Cédric Fournet, Cătălin Hriţcu, Matteo Maffei , et al. · 2017

Relational properties describe multiple runs of one or more programs. They characterize many useful notions of security, program refinement, and equivalence for programs with diverse computational effects, and they have received much atten…

A Monadic Framework for Relational Verification: Applied to Information Security, Program Equivalence, and Optimizations Open

Niklas Grimm, Kenji Maillard, Cédric Fournet, Cătălin Hriţcu, Matteo Maffei , et al. · 2017

Relational properties describe multiple runs of one or more programs. They characterize many useful notions of security, program refinement, and equivalence for programs with diverse computational effects, and they have received much atten…

Formal Verification of Smart Contracts Open

Karthikeyan Bhargavan, Antoine Delignat-Lavaud, Cédric Fournet, Anitha Gollamudi, Georges Gonthier , et al. · 2016

International audience

Dependent types and multi-monadic effects in F* Open

Nikhil Swamy, Cătălin Hriţcu, Chantal Keller, Aseem Rastogi, Antoine Delignat-Lavaud , et al. · 2016

International audience

Imperfect Forward Secrecy Open

David Adrian, Karthikeyan Bhargavan, Zakir Durumeric, Pierrick Gaudry, Matthew Green , et al. · 2015

International audience

Santiago Zanella-Béguelin YOU? Author Swipe