Thomas Icard
YOU?
Author Swipe
View article: Internal Causal Mechanisms Robustly Predict Language Model Out-of-Distribution Behaviors
Internal Causal Mechanisms Robustly Predict Language Model Out-of-Distribution Behaviors Open
Interpretability research now offers a variety of techniques for identifying abstract internal mechanisms in neural networks. Can such techniques be used to predict how models will behave on out-of-distribution examples? In this work, we p…
View article: A Communication-First Account of Explanation
A Communication-First Account of Explanation Open
This paper develops a formal account of causal explanation, grounded in a theory of conversational pragmatics, and inspired by the interventionist idea that explanation is about asking and answering what-if-things-had-been-different questi…
View article: Looking back to plan ahead: Causal judgments as a sampling approximation for action effectiveness
Looking back to plan ahead: Causal judgments as a sampling approximation for action effectiveness Open
Throughout human thought and discourse, we make judgments of how much certain particular events caused others: For instance, we judge that a product sold because of its viral ad campaign more than because of its celebrity endorsement, or v…
View article: When AI meets counterfactuals: the ethical implications of counterfactual world simulation models
When AI meets counterfactuals: the ethical implications of counterfactual world simulation models Open
This paper examines the transformative potential of AI embedded with counterfactual world simulation models (CWSMs). A CWSM uses multimodal evidence, such as the CCTV footage of a road accident, to build a high-fidelity 3D reconstruction o…
View article: Modeling Discrimination with Causal Abstraction
Modeling Discrimination with Causal Abstraction Open
A person is directly racially discriminated against only if her race caused her worse treatment. This implies that race is an attribute sufficiently separable from other attributes to isolate its causal role. But race is embedded in a nexu…
View article: Belief in the Machine: Investigating Epistemological Blind Spots of Language Models
Belief in the Machine: Investigating Epistemological Blind Spots of Language Models Open
As language models (LMs) become integral to fields like healthcare, law, and journalism, their ability to differentiate between fact, belief, and knowledge is essential for reliable decision-making. Failure to grasp these distinctions can …
View article: Anticipating the Risks and Benefits of Counterfactual World Simulation Models (Extended Abstract)
Anticipating the Risks and Benefits of Counterfactual World Simulation Models (Extended Abstract) Open
This paper examines the transformative potential of Counterfactual World Simulation Models (CWSMs). CWSMs use pieces of multi-modal evidence, such as the CCTV footage or sound recordings of a road accident, to build a high-fidelity 3D reco…
View article: Do as I explain: Explanations communicate optimal interventions
Do as I explain: Explanations communicate optimal interventions Open
People often select only a few events when explaining what happened. What drives people's explanation selection? Prior research argued that people's explanation choices are affected by event normality and causal structure. Here, we propose…
View article: On Probabilistic and Causal Reasoning with Summation Operators
On Probabilistic and Causal Reasoning with Summation Operators Open
Ibeling et al. (2023). axiomatize increasingly expressive languages of causation and probability, and Mosse et al. (2024) show that reasoning (specifically the satisfiability problem) in each causal language is as difficult, from a computa…
View article: A Reply to Makelov et al. (2023)'s "Interpretability Illusion" Arguments
A Reply to Makelov et al. (2023)'s "Interpretability Illusion" Arguments Open
We respond to the recent paper by Makelov et al. (2023), which reviews subspace interchange intervention methods like distributed alignment search (DAS; Geiger et al. 2023) and claims that these methods potentially cause "interpretability …
View article: Anticipating the risks and benefits of counterfactual world simulation models
Anticipating the risks and benefits of counterfactual world simulation models Open
This paper examines the transformative potential of Counterfactual World Simulation Models (CWSMs). CWSMs use pieces of multi-modal evidence, such as the CCTV footage or sound recordings of a road accident, to build a high-fidelity 3D reco…
View article: Probing the quantitative–qualitative divide in probabilistic reasoning
Probing the quantitative–qualitative divide in probabilistic reasoning Open
This paper explores the space of (propositional) probabilistic logical languages, ranging from a purely ‘qualitative’ comparative language to a highly ‘quantitative’ language involving arbitrary polynomials over probability terms. While ta…
View article: Probing the Quantitative-Qualitative Divide in Probabilistic Reasoning
Probing the Quantitative-Qualitative Divide in Probabilistic Reasoning Open
This paper explores the space of (propositional) probabilistic logical languages, ranging from a purely `qualitative' comparative language to a highly `quantitative' language involving arbitrary polynomials over probability terms. While ta…
View article: Comparing Causal Frameworks: Potential Outcomes, Structural Models, Graphs, and Abstractions
Comparing Causal Frameworks: Potential Outcomes, Structural Models, Graphs, and Abstractions Open
The aim of this paper is to make clear and precise the relationship between the Rubin causal model (RCM) and structural causal model (SCM) frameworks for causal inference. Adopting a neutral logical perspective, and drawing on previous wor…
View article: A Semantics for Causing, Enabling, and Preventing Verbs Using Structural Causal Models
A Semantics for Causing, Enabling, and Preventing Verbs Using Structural Causal Models Open
When choosing how to describe what happened, we have a number of causal verbs at our disposal. In this paper, we develop a model-theoretic formal semantics for nine causal verbs that span the categories of CAUSE, ENABLE, and PREVENT. We us…
View article: Show and tell: Learning causal structures from observations and explanations
Show and tell: Learning causal structures from observations and explanations Open
There are at least three ways of learning how the world works: learning from observations, from interventions, and from explanations. Prior work on causal inference focused on how people learn causal structures through observation and inte…
View article: Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations
Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations Open
Causal abstraction is a promising theoretical framework for explainable artificial intelligence that defines when an interpretable high-level causal model is a faithful simplification of a low-level deep learning system. However, existing …
View article: Causal Abstraction: A Theoretical Foundation for Mechanistic Interpretability
Causal Abstraction: A Theoretical Foundation for Mechanistic Interpretability Open
Causal abstraction provides a theoretical foundation for mechanistic interpretability, the field concerned with providing intelligible algorithms that are faithful simplifications of the known, but opaque low-level details of black box AI …
View article: Causal Abstraction with Soft Interventions
Causal Abstraction with Soft Interventions Open
Causal abstraction provides a theory describing how several causal models can represent the same system at different levels of detail. Existing theoretical proposals limit the analysis of abstract models to "hard" interventions fixing caus…
View article: Holistic Evaluation of Language Models
Holistic Evaluation of Language Models Open
Language models (LMs) are becoming the foundation for almost all major language technologies, but their capabilities, limitations, and risks are not well understood. We present Holistic Evaluation of Language Models (HELM) to improve the t…
View article: A Completeness Result for Inequational Reasoning in a Full Higher-Order Setting
A Completeness Result for Inequational Reasoning in a Full Higher-Order Setting Open
This paper obtains a completeness result for inequational reasoning with applicative terms without variables in a setting where the intended semantic models are the full structures, the full type hierarchies over preorders for the base typ…
View article: Causal Distillation for Language Models
Causal Distillation for Language Models Open
Distillation efforts have led to language models that are more compact and efficient without serious drops in performance. The standard approach to distillation trains a student model against two objectives: a task-specific objective (e.g.…
View article: Inducing Causal Structure for Interpretable Neural Networks
Inducing Causal Structure for Interpretable Neural Networks Open
In many areas, we have well-founded insights about causal structure that would be useful to bring into our trained models while still allowing them to learn in a data-driven fashion. To achieve this, we present the new method of interchang…
View article: Is Causal Reasoning Harder than Probabilistic Reasoning?
Is Causal Reasoning Harder than Probabilistic Reasoning? Open
Many tasks in statistical and causal inference can be construed as problems of \emph{entailment} in a suitable formal language. We ask whether those problems are more difficult, from a computational perspective, for \emph{causal} probabili…
View article: An interaction effect of norm violations on causal judgment
An interaction effect of norm violations on causal judgment Open
Existing research has shown that norm violations influence causal judgments, and a number of different models have been developed to explain these effects. One such model, the necessity/sufficiency model, predicts an interaction pattern in…
View article: On the Opportunities and Risks of Foundation Models
On the Opportunities and Risks of Foundation Models Open
AI is undergoing a paradigm shift with the rise of models (e.g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks. We call these models foundation models to underscore their…
View article: A Topological Perspective on Causal Inference
A Topological Perspective on Causal Inference Open
This paper presents a topological learning-theoretic perspective on causal inference by introducing a series of topologies defined on general spaces of structural causal models (SCMs). As an illustration of the framework we prove a topolog…
View article: Causal Abstractions of Neural Networks
Causal Abstractions of Neural Networks Open
Structural analysis methods (e.g., probing and feature attribution) are increasingly important tools for neural network analysis. We propose a new structural analysis method grounded in a formal theory of causal abstraction that provides r…
View article: Provability and interpretability logics with restricted realizations
Provability and interpretability logics with restricted realizations Open
The provability logic of a theory T is the set of modal formulas, which under any arithmetical realization are provable in T . We slightly modify this notion by requiring the arithmetical realizations to come from a specified set $Γ$. We m…
View article: Inference from explanation
Inference from explanation Open
What do we communicate with causal explanations? Upon being told, "E because C", one might learn that C and E both occurred, and perhaps that there is a causal relationship between C and E. In fact, causal explanations systematically discl…