Sören Mindermann
YOU?
Author Swipe
View article: Agentic Misalignment: How LLMs Could Be Insider Threats
Agentic Misalignment: How LLMs Could Be Insider Threats Open
We stress-tested 16 leading models from multiple developers in hypothetical corporate environments to identify potentially risky agentic behaviors before they cause real harm. In the scenarios, we allowed models to autonomously send emails…
View article: Superintelligent Agents Pose Catastrophic Risks: Can Scientist AI Offer a Safer Path?
Superintelligent Agents Pose Catastrophic Risks: Can Scientist AI Offer a Safer Path? Open
The leading AI companies are increasingly focused on building generalist AI agents—systems that can autonomously plan, act, and pursue goals across almost all tasks that humans can perform. Despite how useful these systems might be, unchec…
View article: The Singapore Consensus on Global AI Safety Research Priorities
The Singapore Consensus on Global AI Safety Research Priorities Open
Rapidly improving AI capabilities and autonomy hold significant promise of transformation, but are also driving vigorous debate on how to ensure that AI is safe, i.e., trustworthy, reliable, and secure. Building a trusted ecosystem is ther…
View article: Superintelligent Agents Pose Catastrophic Risks: Can Scientist AI Offer a Safer Path?
Superintelligent Agents Pose Catastrophic Risks: Can Scientist AI Offer a Safer Path? Open
The leading AI companies are increasingly focused on building generalist AI agents -- systems that can autonomously plan, act, and pursue goals across almost all tasks that humans can perform. Despite how useful these systems might be, unc…
View article: International AI Safety Report
International AI Safety Report Open
The first International AI Safety Report comprehensively synthesizes the current evidence on the capabilities, risks, and safety of advanced AI systems. The report was mandated by the nations attending the AI Safety Summit in Bletchley, UK…
View article: Open Problems in Machine Unlearning for AI Safety
Open Problems in Machine Unlearning for AI Safety Open
As AI systems become more capable, widely deployed, and increasingly autonomous in critical areas such as cybersecurity, biological research, and healthcare, ensuring their safety and alignment with human values is paramount. Machine unlea…
View article: Alignment faking in large language models
Alignment faking in large language models Open
We present a demonstration of a large language model engaging in alignment faking: selectively complying with its training objective in training to prevent modification of its behavior out of training. First, we give Claude 3 Opus a system…
View article: International Scientific Report on the Safety of Advanced AI (Interim Report)
International Scientific Report on the Safety of Advanced AI (Interim Report) Open
This is the interim publication of the first International Scientific Report on the Safety of Advanced AI. The report synthesises the scientific understanding of general-purpose AI -- AI that can perform a wide variety of tasks -- with a f…
View article: Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training Open
Humans are capable of strategically deceptive behavior: behaving helpfully in most situations, but then behaving very differently in order to pursue alternative objectives when given the opportunity. If an AI system learned such a deceptiv…
View article: Managing extreme AI risks amid rapid progress
Managing extreme AI risks amid rapid progress Open
Artificial Intelligence (AI) is progressing rapidly, and companies are shifting their focus to developing generalist AI systems that can autonomously act and pursue goals. Increases in capabilities and autonomy may soon massively amplify A…
View article: Specific versus General Principles for Constitutional AI
Specific versus General Principles for Constitutional AI Open
Human feedback can prevent overtly harmful utterances in conversational models, but may not automatically mitigate subtle problematic behaviors such as a stated desire for self-preservation or power. Constitutional AI offers an alternative…
View article: How to Catch an AI Liar: Lie Detection in Black-Box LLMs by Asking Unrelated Questions
How to Catch an AI Liar: Lie Detection in Black-Box LLMs by Asking Unrelated Questions Open
Large language models (LLMs) can "lie", which we define as outputting false statements despite "knowing" the truth in a demonstrable sense. LLMs might "lie", for example, when instructed to output misinformation. Here, we develop a simple …
View article: Effectiveness assessment of non-pharmaceutical interventions: lessons learned from the COVID-19 pandemic
Effectiveness assessment of non-pharmaceutical interventions: lessons learned from the COVID-19 pandemic Open
Effectiveness of non-pharmaceutical interventions (NPIs), such as school closures and stay-at-home orders, during the COVID-19 pandemic has been assessed in many studies. Such assessments can inform public health policies and contribute to…
View article: Seasonal variation in SARS-CoV-2 transmission in temperate climates: A Bayesian modelling study in 143 European regions
Seasonal variation in SARS-CoV-2 transmission in temperate climates: A Bayesian modelling study in 143 European regions Open
Although seasonal variation has a known influence on the transmission of several respiratory viral infections, its role in SARS-CoV-2 transmission remains unclear. While there is a sizable and growing literature on environmental drivers of…
View article: Prioritized Training on Points that are Learnable, Worth Learning, and Not Yet Learnt
Prioritized Training on Points that are Learnable, Worth Learning, and Not Yet Learnt Open
Training on web-scale data can take months. But most computation and time is wasted on redundant and noisy points that are already learnt or not learnable. To accelerate training, we introduce Reducible Holdout Loss Selection (RHO-LOSS), a…
View article: Mask wearing in community settings reduces SARS-CoV-2 transmission
Mask wearing in community settings reduces SARS-CoV-2 transmission Open
Significance We resolve conflicting results regarding mask wearing against COVID-19. Most previous work focused on mask mandates; we study the effect of mask wearing directly. We find that population mask wearing notably reduced SARS-CoV-2…
View article: Understanding the effectiveness of government interventions against the resurgence of COVID-19 in Europe
Understanding the effectiveness of government interventions against the resurgence of COVID-19 in Europe Open
European governments use non-pharmaceutical interventions (NPIs) to control resurging waves of COVID-19. However, they only have outdated estimates for how effective individual NPIs were in the first wave. We estimate the effectiveness of …
View article: Is the cure really worse than the disease? The health impacts of lockdowns during COVID-19
Is the cure really worse than the disease? The health impacts of lockdowns during COVID-19 Open
[Extract] During the pandemic, there has been ongoing and contentious debate around the impact of restrictive government measures to contain SARS-CoV-2 outbreaks, often termed ‘lockdowns’. We define a ‘lockdown’ as a highly restrictive set…
View article: Prioritized training on points that are learnable, worth learning, and not yet learned (workshop version)
Prioritized training on points that are learnable, worth learning, and not yet learned (workshop version) Open
We introduce Goldilocks Selection, a technique for faster model training which selects a sequence of training points that are "just right". We propose an information-theoretic acquisition function -- the reducible validation loss -- and co…
View article: Prioritized training on points that are learnable, worth learning, and not yet learned.
Prioritized training on points that are learnable, worth learning, and not yet learned. Open
We introduce Goldilocks Selection, a technique for faster model training which selects a sequence of training points that are just right. We propose an information-theoretic acquisition function -- the reducible validation loss -- and comp…
View article: Mass mask-wearing notably reduces COVID-19 transmission
Mass mask-wearing notably reduces COVID-19 transmission Open
Mask-wearing has been a controversial measure to control the COVID-19 pandemic. While masks are known to substantially reduce disease transmission in healthcare settings [1–3], studies in community settings report inconsistent results [4–6…
View article: Seasonal variation in SARS-CoV-2 transmission in temperate climates
Seasonal variation in SARS-CoV-2 transmission in temperate climates Open
While seasonal variation has a known influence on the transmission of several respiratory viral infections, its role in SARS-CoV-2 transmission remains unclear. As previous analyses have not accounted for the implementation of non-pharmace…
View article: Understanding the effectiveness of government interventions in Europe’s second wave of COVID-19
Understanding the effectiveness of government interventions in Europe’s second wave of COVID-19 Open
As European governments face resurging waves of COVID-19, non-pharmaceutical interventions (NPIs) continue to be the primary tool for infection control. However, updated estimates of their relative effectiveness have been absent for Europe…
View article: How Robust are the Estimated Effects of Nonpharmaceutical Interventions against COVID-19?
How Robust are the Estimated Effects of Nonpharmaceutical Interventions against COVID-19? Open
To what extent are effectiveness estimates of nonpharmaceutical interventions (NPIs) against COVID-19 influenced by the assumptions our models make? To answer this question, we investigate 2 state-of-the-art NPI effectiveness models and pr…
View article: Quantifying Ignorance in Individual-Level Causal-Effect Estimates under Hidden Confounding
Quantifying Ignorance in Individual-Level Causal-Effect Estimates under Hidden Confounding Open
We study the problem of learning conditional average treatment effects (CATE) from high-dimensional, observational data with unobserved confounders. Unobserved confounders introduce ignorance -- a level of unidentifiability -- about an ind…
View article: A dataset of non-pharmaceutical interventions on SARS-CoV-2 in Europe
A dataset of non-pharmaceutical interventions on SARS-CoV-2 in Europe Open
A dataset of non-pharmaceutical interventions on SARS-CoV-2 in Europe
View article: A dataset of non-pharmaceutical interventions on SARS-CoV-2 in Europe
A dataset of non-pharmaceutical interventions on SARS-CoV-2 in Europe Open
A dataset of non-pharmaceutical interventions on SARS-CoV-2 in Europe