Badih Ghazi
YOU?
Author Swipe
View article: Private Hyperparameter Tuning with Ex-Post Guarantee
Private Hyperparameter Tuning with Ex-Post Guarantee Open
The conventional approach in differential privacy (DP) literature formulates the privacy-utility trade-off with a "privacy-first" perspective: for a predetermined level of privacy, a certain utility is achievable. However, practitioners of…
View article: Quantifying Cross-Modality Memorization in Vision-Language Models
Quantifying Cross-Modality Memorization in Vision-Language Models Open
Understanding what and how neural networks memorize during training is crucial, both from the perspective of unintentional memorization of potentially sensitive information and from the standpoint of effective knowledge acquisition for rea…
View article: PREM: Privately Answering Statistical Queries with Relative Error
PREM: Privately Answering Statistical Queries with Relative Error Open
We introduce $\mathsf{PREM}$ (Private Relative Error Multiplicative weight update), a new framework for generating synthetic data that achieves a relative error guarantee for statistical queries under $(\varepsilon, δ)$ differential privac…
View article: Linear-Time User-Level DP-SCO via Robust Statistics
Linear-Time User-Level DP-SCO via Robust Statistics Open
User-level differentially private stochastic convex optimization (DP-SCO) has garnered significant attention due to the paramount importance of safeguarding user privacy in modern large-scale machine learning applications. Current methods,…
View article: Scaling Laws for Differentially Private Language Models
Scaling Laws for Differentially Private Language Models Open
Scaling laws have emerged as important components of large language model (LLM) training as they can predict performance gains through scale, and provide guidance on important hyper-parameter choices that would otherwise be expensive. LLMs…
View article: Balls-and-Bins Sampling for DP-SGD
Balls-and-Bins Sampling for DP-SGD Open
We introduce the Balls-and-Bins sampling for differentially private (DP) optimization methods such as DP-SGD. While it has been common practice to use some form of shuffling in DP-SGD implementations, privacy accounting algorithms have typ…
View article: Differential Privacy on Trust Graphs
Differential Privacy on Trust Graphs Open
We study differential privacy (DP) in a multi-party setting where each party only trusts a (known) subset of the other parties with its data. Specifically, given a trust graph where vertices correspond to parties and neighbors are mutually…
View article: Unlearn and Burn: Adversarial Machine Unlearning Requests Destroy Model Accuracy
Unlearn and Burn: Adversarial Machine Unlearning Requests Destroy Model Accuracy Open
Machine unlearning algorithms, designed for selective removal of training data from models, have emerged as a promising approach to growing privacy concerns. In this work, we expose a critical yet underexplored vulnerability in the deploym…
View article: On Convex Optimization with Semi-Sensitive Features
On Convex Optimization with Semi-Sensitive Features Open
We study the differentially private (DP) empirical risk minimization (ERM) problem under the semi-sensitive DP setting where only some features are sensitive. This generalizes the Label DP setting where only the label is sensitive. We give…
View article: On Computing Pairwise Statistics with Local Differential Privacy
On Computing Pairwise Statistics with Local Differential Privacy Open
We study the problem of computing pairwise statistics, i.e., ones of the form $\binom{n}{2}^{-1} \sum_{i \ne j} f(x_i, x_j)$, where $x_i$ denotes the input to the $i$th user, with differential privacy (DP) in the local model. This formulat…
View article: Individualized Privacy Accounting via Subsampling with Applications in Combinatorial Optimization
Individualized Privacy Accounting via Subsampling with Applications in Combinatorial Optimization Open
In this work, we give a new technique for analyzing individualized privacy accounting via the following simple observation: if an algorithm is one-sided add-DP, then its subsampled variant satisfies two-sided DP. From this, we obtain sever…
View article: Differentially Private Optimization with Sparse Gradients
Differentially Private Optimization with Sparse Gradients Open
Motivated by applications of large embedding models, we study differentially private (DP) optimization problems under sparsity of individual gradients. We start with new near-optimal bounds for the classic mean estimation problem but with …
View article: Differentially Private Ad Conversion Measurement
Differentially Private Ad Conversion Measurement Open
In this work, we study ad conversion measurement, a central functionality in digital advertising, where an advertiser seeks to estimate advertiser website (or mobile app) conversions attributed to ad impressions that users have interacted …
View article: How Private are DP-SGD Implementations?
How Private are DP-SGD Implementations? Open
We demonstrate a substantial gap between the privacy guarantees of the Adaptive Batch Linear Queries (ABLQ) mechanism under different types of batch sampling: (i) Shuffling, and (ii) Poisson subsampling; the typical analysis of Differentia…
View article: Differentially Private Ad Conversion Measurement
Differentially Private Ad Conversion Measurement Open
In this work, we study ad conversion measurement, a central functionality in digital advertising, where an advertiser seeks to estimate advertiser website (or mobile app) conversions attributed to ad impressions that users have interacted …
View article: Training Differentially Private Ad Prediction Models with Semi-Sensitive Features
Training Differentially Private Ad Prediction Models with Semi-Sensitive Features Open
Motivated by problems arising in digital advertising, we introduce the task of training differentially private (DP) machine learning models with semi-sensitive features. In this setting, a subset of the features is known to the attacker (a…
View article: Optimal Unbiased Randomizers for Regression with Label Differential Privacy
Optimal Unbiased Randomizers for Regression with Label Differential Privacy Open
We propose a new family of label randomizers for training regression models under the constraint of label differential privacy (DP). In particular, we leverage the trade-offs between bias and variance to construct better label randomizers …
View article: Summary Reports Optimization in the Privacy Sandbox Attribution Reporting API
Summary Reports Optimization in the Privacy Sandbox Attribution Reporting API Open
The Privacy Sandbox Attribution Reporting API has been recently deployed by Google Chrome to support the basic advertising functionality of attribution reporting (aka conversion measurement) after deprecation of third-party cookies. The AP…
View article: Sparsity-Preserving Differentially Private Training of Large Embedding Models
Sparsity-Preserving Differentially Private Training of Large Embedding Models Open
As the use of large embedding models in recommendation systems and language applications increases, concerns over user data privacy have also risen. DP-SGD, a training algorithm that combines differential privacy with stochastic gradient d…
View article: User-Level Differential Privacy With Few Examples Per User
User-Level Differential Privacy With Few Examples Per User Open
Previous work on user-level differential privacy (DP) [Ghazi et al. NeurIPS 2021, Bun et al. STOC 2023] obtained generic algorithms that work for various learning tasks. However, their focus was on the example-rich regime, where the users …
View article: Differentially Private Aggregation via Imperfect Shuffling
Differentially Private Aggregation via Imperfect Shuffling Open
In this paper, we introduce the imperfect shuffle differential privacy model, where messages sent from users are shuffled in an almost uniform manner before being observed by a curator for private aggregation. We then consider the private …
View article: Optimizing Hierarchical Queries for the Attribution Reporting API
Optimizing Hierarchical Queries for the Attribution Reporting API Open
We study the task of performing hierarchical queries based on summary reports from the {\em Attribution Reporting API} for ad conversion measurement. We demonstrate that methods from optimization and differential privacy can help cope with…
View article: Ticketed Learning-Unlearning Schemes
Ticketed Learning-Unlearning Schemes Open
We consider the learning--unlearning paradigm defined as follows. First given a dataset, the goal is to learn a good predictor, such as one minimizing a certain loss. Subsequently, given any subset of examples that wish to be unlearnt, the…
View article: Differentially Private Data Release over Multiple Tables
Differentially Private Data Release over Multiple Tables Open
We study synthetic data release for answering multiple linear queries over a set of database tables in a differentially private way. Two special cases have been considered in the literature: how to release a synthetic dataset for answering…
View article: Differentially Private Heatmaps
Differentially Private Heatmaps Open
We consider the task of producing heatmaps from users' aggregated data while protecting their privacy. We give a differentially private (DP) algorithm for this task and demonstrate its advantages over previous algorithms on real-world data…
View article: On Differentially Private Sampling from Gaussian and Product Distributions
On Differentially Private Sampling from Gaussian and Product Distributions Open
Given a dataset of $n$ i.i.d. samples from an unknown distribution $P$, we consider the problem of generating a sample from a distribution that is close to $P$ in total variation distance, under the constraint of differential privacy (DP).…
View article: Differentially Private Data Release over Multiple Tables
Differentially Private Data Release over Multiple Tables Open
We study synthetic data release for answering multiple linear queries over a set of database tables in a differentially private way. Two special cases have been considered in the literature: how to release a synthetic dataset for answering…
View article: Pure-DP Aggregation in the Shuffle Model: Error-Optimal and Communication-Efficient
Pure-DP Aggregation in the Shuffle Model: Error-Optimal and Communication-Efficient Open
We obtain a new protocol for binary counting in the $\varepsilon$-shuffle-DP model with error $O(1/\varepsilon)$ and expected communication $\tilde{O}\left(\frac{\log n}{\varepsilon}\right)$ messages per user. Previous protocols incur eith…
View article: On User-Level Private Convex Optimization
On User-Level Private Convex Optimization Open
We introduce a new mechanism for stochastic convex optimization (SCO) with user-level differential privacy guarantees. The convergence rates of this mechanism are similar to those in the prior work of Levy et al. (2021); Narayanan et al. (…
View article: Towards Separating Computational and Statistical Differential Privacy
Towards Separating Computational and Statistical Differential Privacy Open
Computational differential privacy (CDP) is a natural relaxation of the standard notion of (statistical) differential privacy (SDP) proposed by Beimel, Nissim, and Omri (CRYPTO 2008) and Mironov, Pandey, Reingold, and Vadhan (CRYPTO 2009).…