Samson Zhou
YOU?
Author Swipe
View article: Transductive and Learning-Augmented Online Regression
Transductive and Learning-Augmented Online Regression Open
Motivated by the predictable nature of real-life in data streams, we study online regression when the learner has access to predictions about future examples. In the extreme case, called transductive online learning, the sequence of exampl…
View article: On Fine-Grained Distinct Element Estimation
On Fine-Grained Distinct Element Estimation Open
We study the problem of distributed distinct element estimation, where $α$ servers each receive a subset of a universe $[n]$ and aim to compute a $(1+\varepsilon)$-approximation to the number of distinct elements using minimal communicatio…
View article: Relative Error Fair Clustering in the Weak-Strong Oracle Model
Relative Error Fair Clustering in the Weak-Strong Oracle Model Open
We study fair clustering problems in a setting where distance information is obtained from two sources: a strong oracle providing exact distances, but at a high cost, and a weak oracle providing potentially inaccurate distance estimates at…
View article: Perfect Sampling in Turnstile Streams Beyond Small Moments
Perfect Sampling in Turnstile Streams Beyond Small Moments Open
Given a vector x ∈ ℝ n induced by a turnstile stream S , a non-negative function G: ℝ → ℝ, a perfect G -sampler outputs an index i with probability G(x i )/Σ j∈[n] + 1/poly(n). Jayaram and Woodruff (FOCS 2018) introduced a perfect L p -sam…
View article: On the Price of Differential Privacy for Hierarchical Clustering
On the Price of Differential Privacy for Hierarchical Clustering Open
Hierarchical clustering is a fundamental unsupervised machine learning task with the aim of organizing data into a hierarchy of clusters. Many applications of hierarchical clustering involve sensitive user information, therefore motivating…
View article: Fast, Space-Optimal Streaming Algorithms for Clustering and Subspace Embeddings
Fast, Space-Optimal Streaming Algorithms for Clustering and Subspace Embeddings Open
We show that both clustering and subspace embeddings can be performed in the streaming model with the same asymptotic efficiency as in the central/offline setting. For $(k, z)$-clustering in the streaming model, we achieve a number of word…
View article: On Socially Fair Low-Rank Approximation and Column Subset Selection
On Socially Fair Low-Rank Approximation and Column Subset Selection Open
Low-rank approximation and column subset selection are two fundamental and related problems that are applied across a wealth of machine learning applications. In this paper, we study the question of socially fair low-rank approximation and…
View article: Adversarially Robust Dense-Sparse Tradeoffs via Heavy-Hitters
Adversarially Robust Dense-Sparse Tradeoffs via Heavy-Hitters Open
In the adversarial streaming model, the input is a sequence of adaptive updates that defines an underlying dataset and the goal is to approximate, collect, or compute some statistic while using space sublinear in the size of the dataset. I…
View article: On Approximability of $\ell_2^2$ Min-Sum Clustering
On Approximability of $\ell_2^2$ Min-Sum Clustering Open
The $\ell_2^2$ min-sum $k$-clustering problem is to partition an input set into clusters $C_1,\ldots,C_k$ to minimize $\sum_{i=1}^k\sum_{p,q\in C_i}\|p-q\|_2^2$. Although $\ell_2^2$ min-sum $k$-clustering is NP-hard, it is not known whethe…
View article: A Strong Separation for Adversarially Robust $\ell_0$ Estimation for Linear Sketches
A Strong Separation for Adversarially Robust $\ell_0$ Estimation for Linear Sketches Open
The majority of streaming problems are defined and analyzed in a static setting, where the data stream is any worst-case sequence of insertions and deletions that is fixed in advance. However, many real-world applications require a more fl…
View article: Fair Submodular Cover
Fair Submodular Cover Open
Submodular optimization is a fundamental problem with many applications in machine learning, often involving decision-making over datasets with sensitive attributes such as gender or age. In such settings, it is often desirable to produce …
View article: Streaming Algorithms with Few State Changes
Streaming Algorithms with Few State Changes Open
In this paper, we study streaming algorithms that minimize the number of changes made to their internal state (i.e., memory contents). While the design of streaming algorithms typically focuses on minimizing space and update time, these me…
View article: Streaming Algorithms with Few State Changes
Streaming Algorithms with Few State Changes Open
In this paper, we study streaming algorithms that minimize the number of changes made to their internal state (i.e., memory contents). While the design of streaming algorithms typically focuses on minimizing space and update time, these me…
View article: Private Vector Mean Estimation in the Shuffle Model: Optimal Rates Require Many Messages
Private Vector Mean Estimation in the Shuffle Model: Optimal Rates Require Many Messages Open
We study the problem of private vector mean estimation in the shuffle model of privacy where $n$ users each have a unit vector $v^{(i)} \in\mathbb{R}^d$. We propose a new multi-message protocol that achieves the optimal error using $\tilde…
View article: Near-Optimal $k$-Clustering in the Sliding Window Model
Near-Optimal $k$-Clustering in the Sliding Window Model Open
Clustering is an important technique for identifying structural information in large-scale data analysis, where the underlying dataset may be too large to store. In many applications, recent data can provide more accurate information and t…
View article: Streaming Euclidean $k$-median and $k$-means with $o(\log n)$ Space
Streaming Euclidean $k$-median and $k$-means with $o(\log n)$ Space Open
We consider the classic Euclidean $k$-median and $k$-means objective on data streams, where the goal is to provide a $(1+\varepsilon)$-approximation to the optimal $k$-median or $k$-means solution, while using as little memory as possible.…
View article: Differentially Private Aggregation via Imperfect Shuffling
Differentially Private Aggregation via Imperfect Shuffling Open
In this paper, we introduce the imperfect shuffle differential privacy model, where messages sent from users are shuffled in an almost uniform manner before being observed by a curator for private aggregation. We then consider the private …
View article: Private Data Stream Analysis for Universal Symmetric Norm Estimation
Private Data Stream Analysis for Universal Symmetric Norm Estimation Open
We study how to release summary statistics on a data stream subject to the constraint of differential privacy. In particular, we focus on releasing the family of symmetric norms, which are invariant under sign-flips and coordinate-wise per…
View article: Fast $(1+\varepsilon)$-Approximation Algorithms for Binary Matrix Factorization
Fast $(1+\varepsilon)$-Approximation Algorithms for Binary Matrix Factorization Open
We introduce efficient $(1+\varepsilon)$-approximation algorithms for the binary matrix factorization (BMF) problem, where the inputs are a matrix $\mathbf{A}\in\{0,1\}^{n\times d}$, a rank parameter $k>0$, as well as an accuracy parameter…
View article: Robust Algorithms on Adaptive Inputs from Bounded Adversaries
Robust Algorithms on Adaptive Inputs from Bounded Adversaries Open
We study dynamic algorithms robust to adaptive input generated from sources with bounded capabilities, such as sparsity or limited interaction. For example, we consider robust linear algebraic algorithms when the updates to the input are s…
View article: Provable Data Subset Selection For Efficient Neural Network Training
Provable Data Subset Selection For Efficient Neural Network Training Open
Radial basis function neural networks (\emph{RBFNN}) are {well-known} for their capability to approximate any continuous function on a closed bounded set with arbitrary precision given enough hidden neurons. In this paper, we introduce the…
View article: Streaming Algorithms for Learning with Experts: Deterministic Versus Robust
Streaming Algorithms for Learning with Experts: Deterministic Versus Robust Open
In the online learning with experts problem, an algorithm must make a prediction about an outcome on each of $T$ days (or times), given a set of $n$ experts who make predictions on each day (or time). The algorithm is given feedback on the…
View article: Selective experience replay compression using coresets for lifelong deep reinforcement learning in medical imaging
Selective experience replay compression using coresets for lifelong deep reinforcement learning in medical imaging Open
Selective experience replay is a popular strategy for integrating lifelong learning with deep reinforcement learning. Selective experience replay aims to recount selected experiences from previous tasks to avoid catastrophic forgetting. Fu…
View article: Differentially Private $L_2$-Heavy Hitters in the Sliding Window Model
Differentially Private $L_2$-Heavy Hitters in the Sliding Window Model Open
The data management of large companies often prioritize more recent data, as a source of higher accuracy prediction than outdated data. For example, the Facebook data policy retains user search histories for $6$ months while the Google dat…
View article: On Differential Privacy and Adaptive Data Analysis with Bounded Space
On Differential Privacy and Adaptive Data Analysis with Bounded Space Open
We study the space complexity of the two related fields of differential privacy and adaptive data analysis. Specifically, (1) Under standard cryptographic assumptions, we show that there exists a problem P that requires exponentially more …
View article: How to Make Your Approximation Algorithm Private: A Black-Box Differentially-Private Transformation for Tunable Approximation Algorithms of Functions with Low Sensitivity
How to Make Your Approximation Algorithm Private: A Black-Box Differentially-Private Transformation for Tunable Approximation Algorithms of Functions with Low Sensitivity Open
We develop a framework for efficiently transforming certain approximation algorithms into differentially-private variants, in a black-box manner. Specifically, our results focus on algorithms A that output an approximation to a function f …
View article: Sub-quadratic Algorithms for Kernel Matrices via Kernel Density Estimation
Sub-quadratic Algorithms for Kernel Matrices via Kernel Density Estimation Open
Kernel matrices, as well as weighted graphs represented by them, are ubiquitous objects in machine learning, statistics and other related fields. The main drawback of using kernel methods (learning and inference using kernel matrices) is e…
View article: Optimal Algorithms for Linear Algebra in the Current Matrix Multiplication Time
Optimal Algorithms for Linear Algebra in the Current Matrix Multiplication Time Open
We study fundamental problems in linear algebra, such as finding a maximal linearly independent subset of rows or columns (a basis), solving linear regression, or computing a subspace embedding. For these problems, we consider input matric…
View article: Near-Linear Sample Complexity for $L_p$ Polynomial Regression
Near-Linear Sample Complexity for $L_p$ Polynomial Regression Open
We study $L_p$ polynomial regression. Given query access to a function $f:[-1,1] \rightarrow \mathbb{R}$, the goal is to find a degree $d$ polynomial $\hat{q}$ such that, for a given parameter $\varepsilon > 0$, $$ \|\hat{q}-f\|_p\le (1+\v…
View article: Learning-Augmented Algorithms for Online Linear and Semidefinite Programming
Learning-Augmented Algorithms for Online Linear and Semidefinite Programming Open
Semidefinite programming (SDP) is a unifying framework that generalizes both linear programming and quadratically-constrained quadratic programming, while also yielding efficient solvers, both in theory and in practice. However, there exis…