Ananda Theertha Suresh
YOU?
Author Swipe
View article: CafeQ: Calibration-free Quantization via Learned Transformations and Adaptive Rounding
CafeQ: Calibration-free Quantization via Learned Transformations and Adaptive Rounding Open
Post-training quantization is an effective method for reducing the serving cost of large language models, where the standard approach is to use a round-to-nearest quantization level scheme. However, this often introduces large errors due t…
View article: Hierarchical Retrieval: The Geometry and a Pretrain-Finetune Recipe
Hierarchical Retrieval: The Geometry and a Pretrain-Finetune Recipe Open
Dual encoder (DE) models, where a pair of matching query and document are embedded into similar vector representations, are widely used in information retrieval due to their simplicity and scalability. However, the Euclidean geometry of th…
View article: Efficient and Asymptotically Unbiased Constrained Decoding for Large Language Models
Efficient and Asymptotically Unbiased Constrained Decoding for Large Language Models Open
In real-world applications of large language models, outputs are often required to be confined: selecting items from predefined product or document sets, generating phrases that comply with safety standards, or conforming to specialized fo…
View article: Rate of Model Collapse in Recursive Training
Rate of Model Collapse in Recursive Training Open
Given the ease of creating synthetic data from machine learning models, new models can be potentially trained on synthetic data generated by previous models. This recursive training process raises concerns about the long-term impact on mod…
View article: Coupling without Communication and Drafter-Invariant Speculative Decoding
Coupling without Communication and Drafter-Invariant Speculative Decoding Open
Suppose Alice has a distribution $P$ and Bob has a distribution $Q$. Alice wants to draw a sample $a\sim P$ and Bob a sample $b \sim Q$ such that $a = b$ with as high of probability as possible. It is well-known that, by sampling from an o…
View article: Private federated discovery of out-of-vocabulary words for Gboard
Private federated discovery of out-of-vocabulary words for Gboard Open
The vocabulary of language models in Gboard, Google's keyboard application, plays a crucial role for improving user experience. One way to improve the vocabulary is to discover frequently typed out-of-vocabulary (OOV) words on user devices…
View article: Exploring and Improving Drafts in Blockwise Parallel Decoding
Exploring and Improving Drafts in Blockwise Parallel Decoding Open
Despite the remarkable strides made by autoregressive language models, their potential is often hampered by the slow inference speeds inherent in sequential token generation. Blockwise parallel decoding (BPD) was proposed by Stern et al. a…
View article: Asymptotics of Language Model Alignment
Asymptotics of Language Model Alignment Open
Let $p$ denote a generative language model. Let $r$ denote a reward model that returns a scalar that captures the degree at which a draw from $p$ is preferred. The goal of language model alignment is to alter $p$ to a new distribution $ϕ$ …
View article: Block Verification Accelerates Speculative Decoding
Block Verification Accelerates Speculative Decoding Open
Speculative decoding is an effective method for lossless acceleration of large language models during inference. It uses a fast model to draft a block of tokens which are then verified in parallel by the target model, and provides a guaran…
View article: Efficient Language Model Architectures for Differentially Private Federated Learning
Efficient Language Model Architectures for Differentially Private Federated Learning Open
Cross-device federated learning (FL) is a technique that trains a model on data distributed across typically millions of edge devices without data leaving the devices. SGD is the standard client optimizer for on device training in cross-de…
View article: Theoretical guarantees on the best-of-n alignment policy
Theoretical guarantees on the best-of-n alignment policy Open
A simple and effective method for the inference-time alignment and scaling test-time compute of generative models is best-of-$n$ sampling, where $n$ samples are drawn from a reference policy, ranked based on a reward function, and the high…
View article: Mean estimation in the add-remove model of differential privacy
Mean estimation in the add-remove model of differential privacy Open
Differential privacy is often studied under two different models of neighboring datasets: the add-remove model and the swap model. While the swap model is frequently used in the academic literature to simplify analysis, many practical appl…
View article: Multi-Group Fairness Evaluation via Conditional Value-at-Risk Testing
Multi-Group Fairness Evaluation via Conditional Value-at-Risk Testing Open
Machine learning (ML) models used in prediction and classification tasks may display performance disparities across population groups determined by sensitive attributes (e.g., race, sex, age). We consider the problem of evaluating the perf…
View article: SpecTr: Fast Speculative Decoding via Optimal Transport
SpecTr: Fast Speculative Decoding via Optimal Transport Open
Autoregressive sampling from large language models has led to state-of-the-art results in several natural language tasks. However, autoregressive sampling generates tokens one at a time making it slow, and even prohibitive in certain tasks…
View article: Federated Heavy Hitter Recovery under Linear Sketching
Federated Heavy Hitter Recovery under Linear Sketching Open
Motivated by real-life deployments of multi-round federated analytics with secure aggregation, we investigate the fundamental communication-accuracy tradeoffs of the heavy hitter discovery and approximate (open-domain) histogram problems u…
View article: The importance of feature preprocessing for differentially private linear optimization
The importance of feature preprocessing for differentially private linear optimization Open
Training machine learning models with differential privacy (DP) has received increasing interest in recent years. One of the most popular algorithms for training differentially private models is differentially private stochastic gradient d…
View article: FedYolo: Augmenting Federated Learning with Pretrained Transformers
FedYolo: Augmenting Federated Learning with Pretrained Transformers Open
The growth and diversity of machine learning applications motivate a rethinking of learning with mobile and edge devices. How can we address diverse client goals and learn with scarce heterogeneous data? While federated learning aims to ad…
View article: Subset-Based Instance Optimality in Private Estimation
Subset-Based Instance Optimality in Private Estimation Open
We propose a new definition of instance optimality for differentially private estimation algorithms. Our definition requires an optimal algorithm to compete, simultaneously for every dataset $D$, with the best private benchmark algorithm t…
View article: Concentration Bounds for Discrete Distribution Estimation in KL Divergence
Concentration Bounds for Discrete Distribution Estimation in KL Divergence Open
We study the problem of discrete distribution estimation in KL divergence and provide concentration bounds for the Laplace estimator. We show that the deviation from mean scales as $\sqrt{k}/n$ when $n \ge k$, improving upon the best prior…
View article: Private Domain Adaptation from a Public Source
Private Domain Adaptation from a Public Source Open
A key problem in a variety of applications is that of domain adaptation from a public source domain, for which a relatively large amount of labeled data with no privacy constraints is at one's disposal, to a private target domain, for whic…
View article: Algorithms for bounding contribution for histogram estimation under user-level privacy
Algorithms for bounding contribution for histogram estimation under user-level privacy Open
We study the problem of histogram estimation under user-level differential privacy, where the goal is to preserve the privacy of all entries of any single user. We consider the heterogeneous scenario where the quantity of data can be diffe…
View article: Differentially Private Learning with Margin Guarantees
Differentially Private Learning with Margin Guarantees Open
We present a series of new differentially private (DP) algorithms with dimension-independent margin guarantees. For the family of linear hypotheses, we give a pure DP learning algorithm that benefits from relative deviation margin guarante…
View article: Scaling Language Model Size in Cross-Device Federated Learning
Scaling Language Model Size in Cross-Device Federated Learning Open
Most studies in cross-device federated learning focus on small models, due to the server-client communication and on-device computation bottlenecks. In this work, we leverage various techniques for mitigating these bottlenecks to train lar…
View article: Correlated quantization for distributed mean estimation and optimization
Correlated quantization for distributed mean estimation and optimization Open
We study the problem of distributed mean estimation and optimization under communication constraints. We propose a correlated quantization protocol whose leading term in the error guarantee depends on the mean deviation of data points rath…
View article: The Fundamental Price of Secure Aggregation in Differentially Private Federated Learning
The Fundamental Price of Secure Aggregation in Differentially Private Federated Learning Open
We consider the problem of training a $d$ dimensional model with distributed differential privacy (DP) where secure aggregation (SecAgg) is used to ensure that the server only sees the noisy sum of $n$ model updates in every training round…
View article: Proceedings of the First Workshop on Federated Learning for Natural Language Processing (FL4NLP 2022)
Proceedings of the First Workshop on Federated Learning for Natural Language Processing (FL4NLP 2022) Open
In the context of personalized federated learning (FL), the critical challenge is to balance local model improvement and global model tuning when the personal and global objectives may not be exactly aligned.Inspired by Bayesian hierarchic…
View article: Remember What You Want to Forget: Algorithms for Machine Unlearning
Remember What You Want to Forget: Algorithms for Machine Unlearning Open
We study the problem of unlearning datapoints from a learnt model. The learner first receives a dataset $S$ drawn i.i.d. from an unknown distribution, and outputs a model $\widehat{w}$ that performs well on unseen samples from the same dis…
View article: On the Rényi Differential Privacy of the Shuffle Model
On the Rényi Differential Privacy of the Shuffle Model Open
The central question studied in this paper is Renyi Differential Privacy (RDP) guarantees for general discrete local mechanisms in the shuffle privacy model. In the shuffle model, each of the $n$ clients randomizes its response using a loc…
View article: Robust Estimation for Random Graphs
Robust Estimation for Random Graphs Open
We study the problem of robustly estimating the parameter $p$ of an Erdős-Rényi random graph on $n$ nodes, where a $γ$ fraction of nodes may be adversarially corrupted. After showing the deficiencies of canonical estimators, we design a co…
View article: HD-cos Networks: Efficient Neural Architectures for Secure Multi-Party Computation
HD-cos Networks: Efficient Neural Architectures for Secure Multi-Party Computation Open
Multi-party computation (MPC) is a branch of cryptography where multiple non-colluding parties execute a well designed protocol to securely compute a function. With the non-colluding party assumption, MPC has a cryptographic guarantee that…