Fanny Yang
YOU?
Author Swipe
View article: ROC-n-reroll: How verifier imperfection affects test-time scaling
ROC-n-reroll: How verifier imperfection affects test-time scaling Open
Test-time scaling aims to improve language model performance by leveraging additional compute during inference. Many works have empirically studied techniques such as Best-of-N (BoN) and Rejection Sampling (RS) that make use of a verifier …
View article: Learning Pareto manifolds in high dimensions: How can regularization help?
Learning Pareto manifolds in high dimensions: How can regularization help? Open
Simultaneously addressing multiple objectives is becoming increasingly important in modern machine learning. At the same time, data is often high-dimensional and costly to label. For a single objective such as prediction risk, conventional…
View article: Efficient Randomized Experiments Using Foundation Models
Efficient Randomized Experiments Using Foundation Models Open
Randomized experiments are the preferred approach for evaluating the effects of interventions, but they are costly and often yield estimates with substantial uncertainty. On the other hand, in silico experiments leveraging foundation model…
View article: Achievable distributional robustness when the robust risk is only partially identified
Achievable distributional robustness when the robust risk is only partially identified Open
In safety-critical applications, machine learning models should generalize well under worst-case distribution shifts, that is, have a small robust risk. Invariance-based algorithms can provably take advantage of structural assumptions on t…
View article: Atmospheric Transport Modeling of CO<sub>2</sub> With Neural Networks
Atmospheric Transport Modeling of CO<sub>2</sub> With Neural Networks Open
Accurately describing the distribution of in the atmosphere with atmospheric tracer transport models is essential for greenhouse gas monitoring and verification support systems to aid implementation of international climate agreements. Lar…
View article: Copyright-Protected Language Generation via Adaptive Model Fusion
Copyright-Protected Language Generation via Adaptive Model Fusion Open
The risk of language models reproducing copyrighted material from their training data has led to the development of various protective measures. Among these, inference-time strategies that impose constraints via post-processing have shown …
View article: Atmospheric Transport Modeling of CO$_2$ with Neural Networks
Atmospheric Transport Modeling of CO$_2$ with Neural Networks Open
Accurately describing the distribution of CO$_2$ in the atmosphere with atmospheric tracer transport models is essential for greenhouse gas monitoring and verification support systems to aid implementation of international climate agreemen…
View article: Strong Copyright Protection for Language Models via Adaptive Model Fusion
Strong Copyright Protection for Language Models via Adaptive Model Fusion Open
The risk of language models unintentionally reproducing copyrighted material from their training data has led to the development of various protective measures. In this paper, we propose model fusion as an effective solution to safeguard a…
View article: Detecting critical treatment effect bias in small subgroups
Detecting critical treatment effect bias in small subgroups Open
Randomized trials are considered the gold standard for making informed decisions in medicine, yet they often lack generalizability to the patient populations in clinical practice. Observational studies, on the other hand, cover a broader p…
View article: Mini-Workshop: Interpolation and Over-parameterization in Statistics and Machine Learning
Mini-Workshop: Interpolation and Over-parameterization in Statistics and Machine Learning Open
In recent years it has become clear that, contrary to traditional statistical beliefs, methods that interpolate (fit exactly) the noisy training data, can still be statistically optimal. In particular, this phenomenon of “be- nign overfitt…
View article: Graph Neural Networks for Atmospheric Transport Modeling of CO2&#160;
Graph Neural Networks for Atmospheric Transport Modeling of CO2  Open
Large deep neural network emulators are poised to revolutionize numerical weather prediction (NWP). Recent models like GraphCast or NeuralGCM can now compete and sometimes outperform traditional NWP systems, all at much lower computational…
View article: Privacy-preserving data release leveraging optimal transport and particle gradient descent
Privacy-preserving data release leveraging optimal transport and particle gradient descent Open
We present a novel approach for differentially private data synthesis of protected tabular datasets, a relevant task in highly sensitive domains such as healthcare and government. Current state-of-the-art methods predominantly use marginal…
View article: Robust Mixture Learning when Outliers Overwhelm Small Groups
Robust Mixture Learning when Outliers Overwhelm Small Groups Open
We study the problem of estimating the means of well-separated mixtures when an adversary may add arbitrary outliers. While strong guarantees are available when the outlier fraction is significantly smaller than the minimum mixing weight, …
View article: Hidden yet quantifiable: A lower bound for confounding strength using randomized trials
Hidden yet quantifiable: A lower bound for confounding strength using randomized trials Open
In the era of fast-paced precision medicine, observational studies play a major role in properly evaluating new treatments in clinical practice. Yet, unobserved confounding can significantly compromise causal conclusions drawn from non-ran…
View article: Can semi-supervised learning use all the data effectively? A lower bound perspective
Can semi-supervised learning use all the data effectively? A lower bound perspective Open
Prior works have shown that semi-supervised learning algorithms can leverage unlabeled data to improve over the labeled sample complexity of supervised learning (SL) algorithms. However, existing theoretical analyses focus on regimes where…
View article: How robust accuracy suffers from certified training with convex relaxations
How robust accuracy suffers from certified training with convex relaxations Open
Adversarial attacks pose significant threats to deploying state-of-the-art classifiers in safety-critical applications. Two classes of methods have emerged to address this issue: empirical defences and certified defences. Although certifie…
View article: PILLAR: How to make semi-private learning more effective
PILLAR: How to make semi-private learning more effective Open
In Semi-Supervised Semi-Private (SP) learning, the learner has access to both public unlabelled and private labelled data. We propose a computationally efficient algorithm that, under mild assumptions on the data, provably achieves signifi…
View article: Certified private data release for sparse Lipschitz functions
Certified private data release for sparse Lipschitz functions Open
As machine learning has become more relevant for everyday applications, a natural requirement is the protection of the privacy of the training data. When the relevant learning questions are unknown in advance, or hyper-parameter tuning pla…
View article: Strong inductive biases provably prevent harmless interpolation
Strong inductive biases provably prevent harmless interpolation Open
Classical wisdom suggests that estimators should avoid fitting noise to achieve good generalization. In contrast, modern overparameterized models can yield small test error despite interpolating noise -- a phenomenon often called "benign o…
View article: Tight bounds for maximum $\ell_1$-margin classifiers
Tight bounds for maximum $\ell_1$-margin classifiers Open
Popular iterative algorithms such as boosting methods and coordinate descent on linear models converge to the maximum $\ell_1$-margin classifier, a.k.a. sparse hard-margin SVM, in high dimensional regimes where the data is linearly separab…
View article: Margin-based sampling in high dimensions: When being active is less efficient than staying passive
Margin-based sampling in high dimensions: When being active is less efficient than staying passive Open
It is widely believed that given the same labeling budget, active learning (AL) algorithms like margin-based active learning achieve better predictive performance than passive learning (PL), albeit at a higher computational cost. Recent em…
View article: How unfair is private learning ?
How unfair is private learning ? Open
As machine learning algorithms are deployed on sensitive data in critical decision making processes, it is becoming increasingly important that they are also private and fair. In this paper, we show that, when the data has a long-tailed st…
View article: Provable concept learning for interpretable predictions using variational autoencoders
Provable concept learning for interpretable predictions using variational autoencoders Open
In safety-critical applications, practitioners are reluctant to trust neural networks when no interpretable explanations are available. Many attempts to provide such explanations revolve around pixel-based attributions or use previously kn…
View article: Provable concept learning for interpretable predictions using\n variational autoencoders
Provable concept learning for interpretable predictions using\n variational autoencoders Open
In safety-critical applications, practitioners are reluctant to trust neural\nnetworks when no interpretable explanations are available. Many attempts to\nprovide such explanations revolve around pixel-based attributions or use\npreviously…
View article: Fast Rates for Noisy Interpolation Require Rethinking the Effects of Inductive Bias
Fast Rates for Noisy Interpolation Require Rethinking the Effects of Inductive Bias Open
Good generalization performance on high-dimensional data crucially hinges on a simple structure of the ground truth and a corresponding strong inductive bias of the estimator. Even though this intuition is valid for regularized models, in …
View article: Why adversarial training can hurt robust accuracy
Why adversarial training can hurt robust accuracy Open
Machine learning classifiers with high test accuracy often perform poorly under adversarial attacks. It is commonly believed that adversarial training alleviates this issue. In this paper, we demonstrate that, surprisingly, the opposite ma…
View article: Tight bounds for minimum l1-norm interpolation of noisy data
Tight bounds for minimum l1-norm interpolation of noisy data Open
We provide matching upper and lower bounds of order $σ^2/\log(d/n)$ for the prediction error of the minimum $\ell_1$-norm interpolator, a.k.a. basis pursuit. Our result is tight up to negligible terms when $d \gg n$, and is the first to im…
View article: Self-supervised Reinforcement Learning with Independently Controllable Subgoals
Self-supervised Reinforcement Learning with Independently Controllable Subgoals Open
To successfully tackle challenging manipulation tasks, autonomous agents must learn a diverse set of skills and how to combine them. Recently, self-supervised agents that set their own abstract goals by exploiting the discovered structure …
View article: Interpolation can hurt robust generalization even when there is no noise
Interpolation can hurt robust generalization even when there is no noise Open
Numerous recent works show that overparameterization implicitly reduces variance for min-norm interpolators and max-margin classifiers. These findings suggest that ridge regularization has vanishing benefits in high dimensions. We challeng…
View article: How rotational invariance of common kernels prevents generalization in high dimensions
How rotational invariance of common kernels prevents generalization in high dimensions Open
Kernel ridge regression is well-known to achieve minimax optimal rates in low-dimensional settings. However, its behavior in high dimensions is much less understood. Recent work establishes consistency for kernel regression under certain a…