Krishna Pillutla
YOU?
Author Swipe
View article: InvisibleInk: High-Utility and Low-Cost Text Generation with Differential Privacy
InvisibleInk: High-Utility and Low-Cost Text Generation with Differential Privacy Open
As major progress in LLM-based long-form text generation enables paradigms such as retrieval-augmented generation (RAG) and inference-time scaling, safely incorporating private information into the generation remains a critical open questi…
View article: Correlated Noise Mechanisms for Differentially Private Learning
Correlated Noise Mechanisms for Differentially Private Learning Open
This monograph explores the design and analysis of correlated noise mechanisms for differential privacy (DP), focusing on their application to private training of AI and machine learning models via the core primitive of estimation of weigh…
View article: An Inversion Theorem for Buffered Linear Toeplitz (BLT) Matrices and Applications to Streaming Differential Privacy
An Inversion Theorem for Buffered Linear Toeplitz (BLT) Matrices and Applications to Streaming Differential Privacy Open
Buffered Linear Toeplitz (BLT) matrices are a family of parameterized lower-triangular matrices that play an important role in streaming differential privacy with correlated noise. Our main result is a BLT inversion theorem: the inverse of…
View article: Fine-Tuning Large Language Models with User-Level Differential Privacy
Fine-Tuning Large Language Models with User-Level Differential Privacy Open
We investigate practical and scalable algorithms for training large language models (LLMs) with user-level differential privacy (DP) in order to provably safeguard all the examples contributed by each user. We study two variants of DP-SGD …
View article: Efficient and Near-Optimal Noise Generation for Streaming Differential Privacy
Efficient and Near-Optimal Noise Generation for Streaming Differential Privacy Open
In the task of differentially private (DP) continual counting, we receive a stream of increments and our goal is to output an approximate running total of these increments, without revealing too much about any specific increment. Despite i…
View article: Distributionally Robust Optimization with Bias and Variance Reduction
Distributionally Robust Optimization with Bias and Variance Reduction Open
We consider the distributionally robust optimization (DRO) problem with spectral risk-based uncertainty set and $f$-divergence penalty. This formulation includes common risk-sensitive learning objectives such as regularized condition value…
View article: User Inference Attacks on Large Language Models
User Inference Attacks on Large Language Models Open
Fine-tuning is a common and effective method for tailoring large language models (LLMs) to specialized tasks and applications. In this paper, we study the privacy implications of fine-tuning LLMs on user data. To this end, we consider a re…
View article: Correlated Noise Provably Beats Independent Noise for Differentially Private Learning
Correlated Noise Provably Beats Independent Noise for Differentially Private Learning Open
Differentially private learning algorithms inject noise into the learning process. While the most common private learning algorithm, DP-SGD, adds independent Gaussian noise in each iteration, recent work on matrix factorization mechanisms …
View article: Towards Federated Foundation Models: Scalable Dataset Pipelines for Group-Structured Learning
Towards Federated Foundation Models: Scalable Dataset Pipelines for Group-Structured Learning Open
We introduce Dataset Grouper, a library to create large-scale group-structured (e.g., federated) datasets, enabling federated learning simulation at the scale of foundation models. This library facilitates the creation of group-structured …
View article: Unleashing the Power of Randomization in Auditing Differentially Private ML
Unleashing the Power of Randomization in Auditing Differentially Private ML Open
We present a rigorous methodology for auditing differentially private machine learning algorithms by adding multiple carefully designed examples called canaries. We take a first principles approach based on three key components. First, we …
View article: Modified Gauss-Newton Algorithms under Noise
Modified Gauss-Newton Algorithms under Noise Open
Gauss-Newton methods and their stochastic version have been widely used in machine learning and signal processing. Their nonsmooth counterparts, modified Gauss-Newton or prox-linear algorithms, can lead to contrasting outcomes when compare…
View article: MAUVE Scores for Generative Models: Theory and Practice
MAUVE Scores for Generative Models: Theory and Practice Open
Generative artificial intelligence has made significant strides, producing text indistinguishable from human prose and remarkably photorealistic images. Automatically measuring how close the generated data distribution is to the target dis…
View article: Stochastic Optimization for Spectral Risk Measures
Stochastic Optimization for Spectral Risk Measures Open
Spectral risk objectives - also called $L$-risks - allow for learning systems to interpolate between optimizing average-case performance (as in empirical risk minimization) and worst-case performance on a task. We develop stochastic algori…
View article: Statistical and Computational Guarantees for Influence Diagnostics
Statistical and Computational Guarantees for Influence Diagnostics Open
Influence diagnostics such as influence functions and approximate maximum influence perturbations are popular in machine learning and in AI domain applications. Influence diagnostics are powerful statistical tools to identify influential d…
View article: Tackling Distribution Shifts in Federated Learning with Superquantile Aggregation
Tackling Distribution Shifts in Federated Learning with Superquantile Aggregation Open
International audience
View article: Differentially Private Federated Quantiles with the Distributed Discrete Gaussian Mechanism
Differentially Private Federated Quantiles with the Distributed Discrete Gaussian Mechanism Open
International audience
View article: Federated Learning with Heterogeneous Data: A Superquantile Optimization Approach
Federated Learning with Heterogeneous Data: A Superquantile Optimization Approach Open
We present a federated learning framework that is designed to robustly deliver good predictive performance across individual clients with heterogeneous data. The proposed approach hinges upon a superquantile-based learning objective that c…
View article: Federated Learning with Partial Model Personalization
Federated Learning with Partial Model Personalization Open
We consider two federated learning algorithms for training partially personalized models, where the shared and personal parameters are updated either simultaneously or alternately on the devices. Both algorithms have been proposed in the l…
View article: Superquantiles at Work: Machine Learning Applications and Efficient Subgradient Computation
Superquantiles at Work: Machine Learning Applications and Efficient Subgradient Computation Open
R. Tyrell Rockafellar and collaborators introduced, in a series of works, new regression modeling methods based on the notion of superquantile (or conditional value-at-risk). These methods have been influential in economics, finance, manag…
View article: Robust Aggregation for Federated Learning
Robust Aggregation for Federated Learning Open
Federated learning is the centralized training of statistical models from decentralized data on mobile devices while preserving the privacy of each device. We present a robust aggregation approach to make federated learning robust to setti…
View article: Federated Learning with Superquantile Aggregation for Heterogeneous Data
Federated Learning with Superquantile Aggregation for Heterogeneous Data Open
We present a federated learning framework that is designed to robustly deliver good predictive performance across individual clients with heterogeneous data. The proposed approach hinges upon a superquantile-based learning objective that c…
View article: LLC: Accurate, Multi-purpose Learnt Low-dimensional Binary Codes
LLC: Accurate, Multi-purpose Learnt Low-dimensional Binary Codes Open
Learning binary representations of instances and classes is a classical problem with several high potential applications. In modern settings, the compression of high-dimensional neural representations to low-dimensional binary codes is a c…
View article: Divergence Frontiers for Generative Models: Sample Complexity, Quantization Level, and Frontier Integral
Divergence Frontiers for Generative Models: Sample Complexity, Quantization Level, and Frontier Integral Open
The spectacular success of deep generative models calls for quantitative tools to measure their statistical performance. Divergence frontiers have recently been proposed as an evaluation framework for generative models, due to their abilit…
View article: Divergence Frontiers for Generative Models: Sample Complexity, Quantization Effects, and Frontier Integrals
Divergence Frontiers for Generative Models: Sample Complexity, Quantization Effects, and Frontier Integrals Open
The spectacular success of deep generative models calls for quantitative tools to measure their statistical performance. Divergence frontiers have recently been proposed as an evaluation framework for generative models, due to their abilit…
View article: LLC: Accurate, Multi-purpose Learnt Low-dimensional Binary Codes
LLC: Accurate, Multi-purpose Learnt Low-dimensional Binary Codes Open
Learning binary representations of instances and classes is a classical problem with several high potential applications. In modern settings, the compression of high-dimensional neural representations to low-dimensional binary codes is a c…
View article: A Superquantile Approach to Federated Learning with Heterogeneous Devices
A Superquantile Approach to Federated Learning with Heterogeneous Devices Open
International audience