Robert Stanforth
YOU?
Author Swipe
View article: Scalable watermarking for identifying large language model outputs
Scalable watermarking for identifying large language model outputs Open
Large language models (LLMs) have enabled the generation of high-quality synthetic text, often indistinguishable from human-written content, at a scale that can markedly affect the nature of the information ecosystem 1–3 . Watermarking can…
View article: Operationalizing Contextual Integrity in Privacy-Conscious Assistants
Operationalizing Contextual Integrity in Privacy-Conscious Assistants Open
Advanced AI assistants combine frontier LLMs and tool access to autonomously perform complex tasks on behalf of users. While the helpfulness of such assistants can increase dramatically with access to user information including emails and …
View article: Verified Neural Compressed Sensing
Verified Neural Compressed Sensing Open
We develop the first (to the best of our knowledge) provably correct neural networks for a precise computational task, with the proof of correctness generated by an automated verification algorithm without any human input. Prior work on ne…
View article: Unlocking Accuracy and Fairness in Differentially Private Image Classification
Unlocking Accuracy and Fairness in Differentially Private Image Classification Open
Privacy-preserving machine learning aims to train models on private data without leaking sensitive information. Differential privacy (DP) is considered the gold standard framework for privacy-preserving training, as it provides formal priv…
View article: Expressive Losses for Verified Robustness via Convex Combinations
Expressive Losses for Verified Robustness via Convex Combinations Open
In order to train networks for verified adversarial robustness, it is common to over-approximate the worst-case loss over perturbation regions, resulting in networks that attain verifiability at the expense of standard performance. As show…
View article: Differentially Private Diffusion Models Generate Useful Synthetic Images
Differentially Private Diffusion Models Generate Useful Synthetic Images Open
The ability to generate privacy-preserving synthetic versions of sensitive image datasets could unlock numerous ML applications currently constrained by data availability. Due to their astonishing image generation quality, diffusion models…
View article: Enhancing the reliability and accuracy of AI-enabled diagnosis via complementarity-driven deferral to clinicians (CoDoC)
Enhancing the reliability and accuracy of AI-enabled diagnosis via complementarity-driven deferral to clinicians (CoDoC) Open
Diagnostic AI systems trained using deep learning have been shown to achieve expert-level identification of diseases in multiple medical imaging settings1,2. However, such systems are not always reliable and can fail in cases diagnosed acc…
View article: IBP Regularization for Verified Adversarial Robustness via Branch-and-Bound
IBP Regularization for Verified Adversarial Robustness via Branch-and-Bound Open
Recent works have tried to increase the verifiability of adversarially trained networks by running the attacks over domains larger than the original perturbations and adding various regularization terms to the objective. However, these alg…
View article: Verifying Probabilistic Specifications with Functional Lagrangians.
Verifying Probabilistic Specifications with Functional Lagrangians. Open
We propose a general framework for verifying input-output specifications of neural networks using functional Lagrange multipliers that generalizes standard Lagrangian duality. We derive theoretical properties of the framework, which can ha…
View article: Make Sure You're Unsure: A Framework for Verifying Probabilistic Specifications
Make Sure You're Unsure: A Framework for Verifying Probabilistic Specifications Open
Most real world applications require dealing with stochasticity like sensor noise or predictive uncertainty, where formal specifications of desired behavior are inherently probabilistic. Despite the promise of formal verification in ensuri…
View article: Contrastive Training for Improved Out-of-Distribution Detection
Contrastive Training for Improved Out-of-Distribution Detection Open
Reliable detection of out-of-distribution (OOD) inputs is increasingly understood to be a precondition for deployment of machine learning systems. This paper proposes and investigates the use of contrastive training to boost OOD detection …
View article: Reducing Sentiment Bias in Language Models via Counterfactual Evaluation
Reducing Sentiment Bias in Language Models via Counterfactual Evaluation Open
Po-Sen Huang, Huan Zhang, Ray Jiang, Robert Stanforth, Johannes Welbl, Jack Rae, Vishal Maini, Dani Yogatama, Pushmeet Kohli. Findings of the Association for Computational Linguistics: EMNLP 2020. 2020.
View article: Reducing Sentiment Bias in Language Models via Counterfactual Evaluation
Reducing Sentiment Bias in Language Models via Counterfactual Evaluation Open
Advances in language modeling architectures and the availability of large text corpora have driven progress in automatic text generation. While this results in models capable of generating coherent texts, it also prompts models to internal…
View article: Are Labels Required for Improving Adversarial Robustness
Are Labels Required for Improving Adversarial Robustness Open
Recent work has uncovered the interesting (and somewhat surprising) finding that training models to be invariant to adversarial perturbations requires substantially larger datasets than those required for standard classification. This resu…
View article: Achieving Verified Robustness to Symbol Substitutions via Interval Bound Propagation
Achieving Verified Robustness to Symbol Substitutions via Interval Bound Propagation Open
Neural networks are part of many contemporary NLP systems, yet their empirical successes come at the price of vulnerability to adversarial attacks. Previous work has used adversarial training and data augmentation to partially mitigate suc…
View article: A Dual Approach to Verify and Train Deep Networks
A Dual Approach to Verify and Train Deep Networks Open
This paper addressed the problem of formally verifying desirable properties of neural networks, i.e., obtaining provable guarantees that neural networks satisfy specifications relating their inputs and outputs (e.g., robustness to bounded …
View article: Adversarial Robustness through Local Linearization
Adversarial Robustness through Local Linearization Open
Adversarial training is an effective methodology for training deep neural networks that are robust against adversarial, norm-bounded perturbations. However, the computational cost of adversarial training grows prohibitively as the size of …
View article: Towards Stable and Efficient Training of Verifiably Robust Neural Networks
Towards Stable and Efficient Training of Verifiably Robust Neural Networks Open
Training neural networks with verifiable robustness guarantees is challenging. Several existing approaches utilize linear relaxation based neural network output bounds under perturbation, but they can slow down training by a factor of hund…
View article: Are Labels Required for Improving Adversarial Robustness?
Are Labels Required for Improving Adversarial Robustness? Open
Recent work has uncovered the interesting (and somewhat surprising) finding that training models to be invariant to adversarial perturbations requires substantially larger datasets than those required for standard classification. This resu…
View article: Verification of Non-Linear Specifications for Neural Networks
Verification of Non-Linear Specifications for Neural Networks Open
Prior work on neural network verification has focused on specifications that are linear functions of the output of the network, e.g., invariance of the classifier output under adversarial perturbations of the input. In this paper, we exten…
View article: Verification of Non-Linear Specifications for Neural Networks
Verification of Non-Linear Specifications for Neural Networks Open
Prior work on neural network verification has focused on specifications that are linear functions of the output of the network, e.g., invariance of the classifier output under adversarial perturbations of the input. In this paper, we exten…
View article: Achieving Verified Robustness to Symbol Substitutions via Interval Bound Propagation
Achieving Verified Robustness to Symbol Substitutions via Interval Bound Propagation Open
Po-Sen Huang, Robert Stanforth, Johannes Welbl, Chris Dyer, Dani Yogatama, Sven Gowal, Krishnamurthy Dvijotham, Pushmeet Kohli. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th Internationa…
View article: Strength in Numbers: Trading-off Robustness and Computation via Adversarially-Trained Ensembles
Strength in Numbers: Trading-off Robustness and Computation via Adversarially-Trained Ensembles Open
While deep learning has led to remarkable results on a number of challenging problems, researchers have discovered a vulnerability of neural networks in adversarial settings, where small but carefully chosen perturbations to the input can …
View article: On the Effectiveness of Interval Bound Propagation for Training Verifiably Robust Models
On the Effectiveness of Interval Bound Propagation for Training Verifiably Robust Models Open
Recent work has shown that it is possible to train deep neural networks that are provably robust to norm-bounded adversarial perturbations. Most of these methods are based on minimizing an upper bound on the worst-case loss over all possib…
View article: Training verified learners with learned verifiers
Training verified learners with learned verifiers Open
This paper proposes a new algorithmic framework, predictor-verifier training, to train neural networks that are verifiable, i.e., networks that provably satisfy some desired input-output properties. The key idea is to simultaneously train …
View article: A Dual Approach to Scalable Verification of Deep Networks
A Dual Approach to Scalable Verification of Deep Networks Open
This paper addresses the problem of formally verifying desirable properties of neural networks, i.e., obtaining provable guarantees that neural networks satisfy specifications relating their inputs and outputs (robustness to bounded norm a…