Chris J. Maddison
YOU?
Author Swipe
View article: k-Nearest Neighbor Adaptive Sampling, a Simple Tool to Efficiently Explore Conformational Space
k-Nearest Neighbor Adaptive Sampling, a Simple Tool to Efficiently Explore Conformational Space Open
Molecular dynamics (MD) simulations are computationally expensive, which is a limiting factor when simulating biomolecular systems. Adaptive sampling approaches can accelerate the exploration of the conformational space by running repeated…
View article: LM Agents May Fail to Act on Their Own Risk Knowledge
LM Agents May Fail to Act on Their Own Risk Knowledge Open
Language model (LM) agents have demonstrated significant potential for automating real-world tasks, yet they pose a diverse array of potential, severe risks in safety-critical scenarios. In this work, we identify a significant gap between …
View article: Reasoning to Learn from Latent Thoughts
Reasoning to Learn from Latent Thoughts Open
Compute scaling for language model (LM) pretraining has outpaced the growth of human-written texts, leading to concerns that data will become the bottleneck to LM scaling. To continue scaling pretraining in this data-constrained regime, we…
View article: k-Nearest Neighbour Adaptive Sampling (kNN-AS), a Simple Tool to Efficiently Explore Conformational Space
k-Nearest Neighbour Adaptive Sampling (kNN-AS), a Simple Tool to Efficiently Explore Conformational Space Open
Molecular dynamics (MD) simulations are computationally expensive, a limiting factor when simulating biomolecular systems. Adaptive sampling approaches can accelerate the exploration of conformational space by running repeated short MD sim…
View article: MixMin: Finding Data Mixtures via Convex Minimization
MixMin: Finding Data Mixtures via Convex Minimization Open
Modern machine learning pipelines are increasingly combining and mixing data from diverse and disparate sources, e.g., pre-training large language models. Yet, finding the optimal data mixture is a challenging and open problem. We formaliz…
View article: APPL: A Prompt Programming Language for Harmonious Integration of Programs and Large Language Model Prompts
APPL: A Prompt Programming Language for Harmonious Integration of Programs and Large Language Model Prompts Open
View article: On the Efficiency of ERM in Feature Learning
On the Efficiency of ERM in Feature Learning Open
Given a collection of feature maps indexed by a set $\mathcal{T}$, we study the performance of empirical risk minimization (ERM) on regression problems with square loss over the union of the linear classes induced by these feature maps. Th…
View article: Boosting the Predictive Power of Protein Representations with a Corpus of Text Annotations
Boosting the Predictive Power of Protein Representations with a Corpus of Text Annotations Open
Protein language models are trained to predict amino acid sequences from vast protein databases, while learning to represent proteins as feature vectors. These vector representations have enabled impressive applications, from predicting mu…
View article: End-To-End Causal Effect Estimation from Unstructured Natural Language Data
End-To-End Causal Effect Estimation from Unstructured Natural Language Data Open
Knowing the effect of an intervention is critical for human decision-making, but current approaches for causal effect estimation rely on manual data collection and structuring, regardless of the causal assumptions. This increases both the …
View article: APPL: A Prompt Programming Language for Harmonious Integration of Programs and Large Language Model Prompts
APPL: A Prompt Programming Language for Harmonious Integration of Programs and Large Language Model Prompts Open
Large Language Models (LLMs) have become increasingly capable of handling diverse tasks with the aid of well-crafted prompts and integration of external tools, but as task complexity rises, the workflow involving LLMs can be complicated an…
View article: Minimax Linear Regression under the Quantile Risk
Minimax Linear Regression under the Quantile Risk Open
We study the problem of designing minimax procedures in linear regression under the quantile risk. We start by considering the realizable setting with independent Gaussian noise, where for any given noise level and distribution of inputs, …
View article: Test-Time Fairness and Robustness in Large Language Models
Test-Time Fairness and Robustness in Large Language Models Open
Frontier Large Language Models (LLMs) can be socially discriminatory or sensitive to spurious features of their inputs. Because only well-resourced corporations can train frontier LLMs, we need robust test-time strategies to control such b…
View article: MixMax: Distributional Robustness in Function Space via Optimal Data Mixtures
MixMax: Distributional Robustness in Function Space via Optimal Data Mixtures Open
Machine learning models are often required to perform well across several pre-defined settings, such as a set of user groups. Worst-case performance is a common metric to capture this requirement, and is the objective of group distribution…
View article: Observational Scaling Laws and the Predictability of Language Model Performance
Observational Scaling Laws and the Predictability of Language Model Performance Open
Understanding how language model performance varies with scale is critical to benchmark and algorithm development. Scaling laws are one approach to building this understanding, but the requirement of training models across many different s…
View article: Experts Don't Cheat: Learning What You Don't Know By Predicting Pairs
Experts Don't Cheat: Learning What You Don't Know By Predicting Pairs Open
Identifying how much a model ${\widehat{p}}_θ(Y|X)$ knows about the stochastic real-world process $p(Y|X)$ it was trained on is important to ensure it avoids producing incorrect or "hallucinated" answers or taking unsafe actions. But this …
View article: Identifying the Risks of LM Agents with an LM-Emulated Sandbox
Identifying the Risks of LM Agents with an LM-Emulated Sandbox Open
Recent advances in Language Model (LM) agents and tool use, exemplified by applications like ChatGPT Plugins, enable a rich set of capabilities but also amplify potential risks - such as leaking private data or causing financial losses. Id…
View article: Probabilistic Invariant Learning with Randomized Linear Classifiers
Probabilistic Invariant Learning with Randomized Linear Classifiers Open
Designing models that are both expressive and preserve known invariances of tasks is an increasingly hard problem. Existing solutions tradeoff invariance for computational or memory resources. In this work, we show how to leverage randomne…
View article: The Shaped Transformer: Attention Models in the Infinite Depth-and-Width Limit
The Shaped Transformer: Attention Models in the Infinite Depth-and-Width Limit Open
In deep learning theory, the covariance matrix of the representations serves as a proxy to examine the network's trainability. Motivated by the success of Transformers, we study the covariance matrix of a modified Softmax-based attention m…
View article: Benchmarking Neural Network Training Algorithms
Benchmarking Neural Network Training Algorithms Open
Training algorithms, broadly construed, are an essential part of every deep learning pipeline. Training algorithm improvements that speed up training across a wide variety of workloads (e.g., better update rules, tuning protocols, learning…
View article: Contrastive Learning Can Find An Optimal Basis For Approximately View-Invariant Functions
Contrastive Learning Can Find An Optimal Basis For Approximately View-Invariant Functions Open
Contrastive learning is a powerful framework for learning self-supervised representations that generalize well to downstream supervised tasks. We show that multiple existing contrastive learning methods can be reinterpreted as learning ker…
View article: Learning To Cut By Looking Ahead: Cutting Plane Selection via Imitation Learning
Learning To Cut By Looking Ahead: Cutting Plane Selection via Imitation Learning Open
Cutting planes are essential for solving mixed-integer linear problems (MILPs), because they facilitate bound improvements on the optimal solution value. For selecting cuts, modern solvers rely on manually designed heuristics that are tune…
View article: The Machine Learning for Combinatorial Optimization Competition (ML4CO):\n Results and Insights
The Machine Learning for Combinatorial Optimization Competition (ML4CO):\n Results and Insights Open
Combinatorial optimization is a well-established area in operations research\nand computer science. Until recently, its methods have focused on solving\nproblem instances in isolation, ignoring that they often stem from related data\ndistr…
View article: The Machine Learning for Combinatorial Optimization Competition (ML4CO): Results and Insights
The Machine Learning for Combinatorial Optimization Competition (ML4CO): Results and Insights Open
Combinatorial optimization is a well-established area in operations research and computer science. Until recently, its methods have focused on solving problem instances in isolation, ignoring that they often stem from related data distribu…
View article: Augment with Care: Contrastive Learning for Combinatorial Problems
Augment with Care: Contrastive Learning for Combinatorial Problems Open
Supervised learning can improve the design of state-of-the-art solvers for combinatorial problems, but labelling large numbers of combinatorial instances is often impractical due to exponential worst-case complexity. Inspired by the recent…
View article: Bayesian Nonparametrics for Offline Skill Discovery
Bayesian Nonparametrics for Offline Skill Discovery Open
Skills or low-level policies in reinforcement learning are temporally extended actions that can speed up learning and enable complex behaviours. Recent work in offline reinforcement learning and imitation learning has proposed several tech…
View article: Optimal Representations for Covariate Shift
Optimal Representations for Covariate Shift Open
Machine learning systems often experience a distribution shift between training and testing. In this paper, we introduce a simple variational objective whose optima are exactly the set of all representations on which risk minimizers are gu…
View article: Learning Generalized Gumbel-max Causal Mechanisms
Learning Generalized Gumbel-max Causal Mechanisms Open
To perform counterfactual reasoning in Structural Causal Models (SCMs), one needs to know the causal mechanisms, which provide factorizations of conditional distributions into noise sources and deterministic functions mapping realizations …
View article: Unbiased Gradient Estimation with Balanced Assignments for Mixtures of\n Experts
Unbiased Gradient Estimation with Balanced Assignments for Mixtures of\n Experts Open
Training large-scale mixture of experts models efficiently on modern hardware\nrequires assigning datapoints in a batch to different experts, each with a\nlimited capacity. Recently proposed assignment procedures lack a probabilistic\ninte…
View article: Unbiased Gradient Estimation with Balanced Assignments for Mixtures of Experts
Unbiased Gradient Estimation with Balanced Assignments for Mixtures of Experts Open
Training large-scale mixture of experts models efficiently on modern hardware requires assigning datapoints in a batch to different experts, each with a limited capacity. Recently proposed assignment procedures lack a probabilistic interpr…
View article: Lossy Compression for Lossless Prediction
Lossy Compression for Lossless Prediction Open
Most data is automatically collected and only ever "seen" by algorithms. Yet, data compressors preserve perceptual fidelity rather than just the information needed by algorithms performing downstream tasks. In this paper, we characterize t…