Explanipedia

Scalable watermarking for identifying large language model outputs Open

Sumanth Dathathri, Abigail See, Sumedh Ghaisas, Po-Sen Huang, Rob McAdam , et al. · 2024

Large language models (LLMs) have enabled the generation of high-quality synthetic text, often indistinguishable from human-written content, at a scale that can markedly affect the nature of the information ecosystem 1–3 . Watermarking can…

Operationalizing Contextual Integrity in Privacy-Conscious Assistants Open

Sahra Ghalebikesabi, Eugene Bagdasaryan, Ren Yi, Itay Yona, Ilia Shumailov , et al. · 2024

Computer science Business Psychology

Advanced AI assistants combine frontier LLMs and tool access to autonomously perform complex tasks on behalf of users. While the helpfulness of such assistants can increase dramatically with access to user information including emails and …

Verified Neural Compressed Sensing Open

Rudy Bunel, Krishnamurthy Dvijotham, M. Pawan Kumar, Alessandro De Palma, Robert Stanforth · 2024

Computer science

We develop the first (to the best of our knowledge) provably correct neural networks for a precise computational task, with the proof of correctness generated by an automated verification algorithm without any human input. Prior work on ne…

Unlocking Accuracy and Fairness in Differentially Private Image Classification Open

Leonard Berrada, Soham De, Judy Hanwen Shen, Jamie Hayes, Robert Stanforth , et al. · 2023

Computer science History

Privacy-preserving machine learning aims to train models on private data without leaking sensitive information. Differential privacy (DP) is considered the gold standard framework for privacy-preserving training, as it provides formal priv…

Expressive Losses for Verified Robustness via Convex Combinations Open

Alessandro De Palma, Rudy Bunel, Krishnamurthy Dvijotham, M. Pawan Kumar, Robert Stanforth , et al. · 2023

Computer science Mathematics Physics

In order to train networks for verified adversarial robustness, it is common to over-approximate the worst-case loss over perturbation regions, resulting in networks that attain verifiability at the expense of standard performance. As show…

Differentially Private Diffusion Models Generate Useful Synthetic Images Open

Sahra Ghalebikesabi, Leonard Berrada, Sven Gowal, Sofia Ira Ktena, Robert Stanforth , et al. · 2023

Computer science

The ability to generate privacy-preserving synthetic versions of sensitive image datasets could unlock numerous ML applications currently constrained by data availability. Due to their astonishing image generation quality, diffusion models…

Enhancing the reliability and accuracy of AI-enabled diagnosis via complementarity-driven deferral to clinicians (CoDoC) Open

Krishnamurthy Dvijotham, Jim Winkens, Melih Barsbey, Sumedh Ghaisas, Nick Pawlowski , et al. · 2022

Computer science Medicine Business

Diagnostic AI systems trained using deep learning have been shown to achieve expert-level identification of diseases in multiple medical imaging settings1,2. However, such systems are not always reliable and can fail in cases diagnosed acc…

IBP Regularization for Verified Adversarial Robustness via Branch-and-Bound Open

Alessandro De Palma, Rudy Bunel, Krishnamurthy Dvijotham, M. Pawan Kumar, Robert Stanforth · 2022

Computer science Mathematics Chemistry

Recent works have tried to increase the verifiability of adversarially trained networks by running the attacks over domains larger than the original perturbations and adding various regularization terms to the objective. However, these alg…

Verifying Probabilistic Specifications with Functional Lagrangians. Open

Leonard Berrada, Sumanth Dathathri, Krishnamurthy Dvijotham, Robert Stanforth, Rudy Bunel , et al. · 2021

Computer science Mathematics Physics

We propose a general framework for verifying input-output specifications of neural networks using functional Lagrange multipliers that generalizes standard Lagrangian duality. We derive theoretical properties of the framework, which can ha…

Make Sure You're Unsure: A Framework for Verifying Probabilistic Specifications Open

Leonard Berrada, Sumanth Dathathri, Krishnamurthy Dvijotham, Robert Stanforth, Rudy Bunel , et al. · 2021

Computer science Chemistry

Most real world applications require dealing with stochasticity like sensor noise or predictive uncertainty, where formal specifications of desired behavior are inherently probabilistic. Despite the promise of formal verification in ensuri…

Contrastive Training for Improved Out-of-Distribution Detection Open

Jim Winkens, Rudy Bunel, Abhijit Guha Roy, Robert Stanforth, Vivek Natarajan , et al. · 2020

Computer science Engineering Psychology

Reliable detection of out-of-distribution (OOD) inputs is increasingly understood to be a precondition for deployment of machine learning systems. This paper proposes and investigates the use of contrastive training to boost OOD detection …

Reducing Sentiment Bias in Language Models via Counterfactual Evaluation Open

Po-Sen Huang, Huan Zhang, Ray Jiang, Robert Stanforth, Johannes Welbl , et al. · 2020

Computer science Psychology Mathematics

Po-Sen Huang, Huan Zhang, Ray Jiang, Robert Stanforth, Johannes Welbl, Jack Rae, Vishal Maini, Dani Yogatama, Pushmeet Kohli. Findings of the Association for Computational Linguistics: EMNLP 2020. 2020.

Reducing Sentiment Bias in Language Models via Counterfactual Evaluation Open

Po-Sen Huang, Huan Zhang, Ray Jiang, Robert Stanforth, Johannes Welbl , et al. · 2019

Computer science Psychology Biology

Advances in language modeling architectures and the availability of large text corpora have driven progress in automatic text generation. While this results in models capable of generating coherent texts, it also prompts models to internal…

Are Labels Required for Improving Adversarial Robustness Open

Jean-Baptiste Alayrac, Jonathan Uesato, Po-Sen Huang, Alhussein Fawzi, Robert Stanforth , et al. · 2019

Computer science Mathematics Chemistry

Recent work has uncovered the interesting (and somewhat surprising) finding that training models to be invariant to adversarial perturbations requires substantially larger datasets than those required for standard classification. This resu…

Achieving Verified Robustness to Symbol Substitutions via Interval Bound Propagation Open

Po-Sen Huang, Robert Stanforth, Johannes Welbl, Chris Dyer, Dani Yogatama , et al. · 2019

Computer science Mathematics Chemistry

Neural networks are part of many contemporary NLP systems, yet their empirical successes come at the price of vulnerability to adversarial attacks. Previous work has used adversarial training and data augmentation to partially mitigate suc…

A Dual Approach to Verify and Train Deep Networks Open

Sven Gowal, Krishnamurthy Dvijotham, Robert Stanforth, Timothy Mann, Pushmeet Kohli · 2019

Computer science Mathematics Psychology

This paper addressed the problem of formally verifying desirable properties of neural networks, i.e., obtaining provable guarantees that neural networks satisfy specifications relating their inputs and outputs (e.g., robustness to bounded …

Adversarial Robustness through Local Linearization Open

Chongli Qin, James Martens, Sven Gowal, Dilip Krishnan, Krishnamurthy Dvijotham , et al. · 2019

Computer science Mathematics Physics

Adversarial training is an effective methodology for training deep neural networks that are robust against adversarial, norm-bounded perturbations. However, the computational cost of adversarial training grows prohibitively as the size of …

Towards Stable and Efficient Training of Verifiably Robust Neural Networks Open

Huan Zhang, Hongge Chen, Chaowei Xiao, Sven Gowal, Robert Stanforth , et al. · 2019

Computer science Mathematics Psychology

Training neural networks with verifiable robustness guarantees is challenging. Several existing approaches utilize linear relaxation based neural network output bounds under perturbation, but they can slow down training by a factor of hund…

Are Labels Required for Improving Adversarial Robustness? Open

Jonathan Uesato, Jean-Baptiste Alayrac, Po-Sen Huang, Robert Stanforth, Alhussein Fawzi , et al. · 2019

Computer science Mathematics Chemistry

Recent work has uncovered the interesting (and somewhat surprising) finding that training models to be invariant to adversarial perturbations requires substantially larger datasets than those required for standard classification. This resu…

Verification of Non-Linear Specifications for Neural Networks Open

Chongli Qin, Krishnamurthy, Dvijotham, Brendan O’Donoghue, Rudy Bunel , et al. · 2019

Computer science Mathematics Physics

Prior work on neural network verification has focused on specifications that are linear functions of the output of the network, e.g., invariance of the classifier output under adversarial perturbations of the input. In this paper, we exten…

Verification of Non-Linear Specifications for Neural Networks Open

Chongli Qin, Krishnamurthy, Dvijotham, Brendan O’Donoghue, Rudy Bunel , et al. · 2019

Computer science Mathematics Physics

Prior work on neural network verification has focused on specifications that are linear functions of the output of the network, e.g., invariance of the classifier output under adversarial perturbations of the input. In this paper, we exten…

Achieving Verified Robustness to Symbol Substitutions via Interval Bound Propagation Open

Po-Sen Huang, Robert Stanforth, Johannes Welbl, Chris Dyer, Dani Yogatama , et al. · 2019

Computer science Mathematics Chemistry

Po-Sen Huang, Robert Stanforth, Johannes Welbl, Chris Dyer, Dani Yogatama, Sven Gowal, Krishnamurthy Dvijotham, Pushmeet Kohli. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th Internationa…

Strength in Numbers: Trading-off Robustness and Computation via Adversarially-Trained Ensembles Open

Edward Grefenstette, Robert Stanforth, Brendan O’Donoghue, Jonathan Uesato, Grzegorz Świrszcz , et al. · 2018

Computer science Chemistry

While deep learning has led to remarkable results on a number of challenging problems, researchers have discovered a vulnerability of neural networks in adversarial settings, where small but carefully chosen perturbations to the input can …

On the Effectiveness of Interval Bound Propagation for Training Verifiably Robust Models Open

Sven Gowal, Krishnamurthy Dvijotham, Robert Stanforth, Rudy Bunel, Chongli Qin , et al. · 2018

Computer science Mathematics Physics

Recent work has shown that it is possible to train deep neural networks that are provably robust to norm-bounded adversarial perturbations. Most of these methods are based on minimizing an upper bound on the worst-case loss over all possib…

Training verified learners with learned verifiers Open

Krishnamurthy Dvijotham, Sven Gowal, Robert Stanforth, Relja Arandjelović, Brendan O’Donoghue , et al. · 2018

Computer science Chemistry Philosophy

This paper proposes a new algorithmic framework, predictor-verifier training, to train neural networks that are verifiable, i.e., networks that provably satisfy some desired input-output properties. The key idea is to simultaneously train …

A Dual Approach to Scalable Verification of Deep Networks Open

Krishnamurthy, Dvijotham, Robert Stanforth, Sven Gowal, Timothy Mann , et al. · 2018

Computer science Mathematics Chemistry

This paper addresses the problem of formally verifying desirable properties of neural networks, i.e., obtaining provable guarantees that neural networks satisfy specifications relating their inputs and outputs (robustness to bounded norm a…

Robert Stanforth YOU? Author Swipe