Explanipedia

Transductive and Learning-Augmented Online Regression Open

Shenghao Xie, Samson Zhou · 2025

Motivated by the predictable nature of real-life in data streams, we study online regression when the learner has access to predictions about future examples. In the extreme case, called transductive online learning, the sequence of exampl…

On Fine-Grained Distinct Element Estimation Open

Ilias Diakonikolas, Daniel M. Kane, Samson Zhou · 2025

We study the problem of distributed distinct element estimation, where $α$ servers each receive a subset of a universe $[n]$ and aim to compute a $(1+\varepsilon)$-approximation to the number of distinct elements using minimal communicatio…

Relative Error Fair Clustering in the Weak-Strong Oracle Model Open

Vladimir Braverman, Prathamesh Dharangutte, Shaofeng H. -C. Jiang, Hoi-Chan Nguyen, Chen Wang , et al. · 2025

We study fair clustering problems in a setting where distance information is obtained from two sources: a strong oracle providing exact distances, but at a high cost, and a weak oracle providing potentially inaccurate distance estimates at…

Perfect Sampling in Turnstile Streams Beyond Small Moments Open

David P. Woodruff, Shenghao Xie, Samson Zhou · 2025

Computer science

Given a vector x ∈ ℝ n induced by a turnstile stream S , a non-negative function G: ℝ → ℝ, a perfect G -sampler outputs an index i with probability G(x i )/Σ j∈[n] + 1/poly(n). Jayaram and Woodruff (FOCS 2018) introduced a perfect L p -sam…

On the Price of Differential Privacy for Hierarchical Clustering Open

Chengyuan Deng, Jie Gao, Jalaj Upadhyay, Chen Wang, Samson Zhou · 2025

Hierarchical clustering is a fundamental unsupervised machine learning task with the aim of organizing data into a hierarchy of clusters. Many applications of hierarchical clustering involve sensitive user information, therefore motivating…

Fast, Space-Optimal Streaming Algorithms for Clustering and Subspace Embeddings Open

Vincent Cohen-Addad, L. Wang, David P. Woodruff, Samson Zhou · 2025

We show that both clustering and subspace embeddings can be performed in the streaming model with the same asymptotic efficiency as in the central/offline setting. For $(k, z)$-clustering in the streaming model, we achieve a number of word…

On Socially Fair Low-Rank Approximation and Column Subset Selection Open

Zhao Song, Ali Vakilian, David P. Woodruff, Samson Zhou · 2024

Mathematics Computer science

Low-rank approximation and column subset selection are two fundamental and related problems that are applied across a wealth of machine learning applications. In this paper, we study the question of socially fair low-rank approximation and…

Adversarially Robust Dense-Sparse Tradeoffs via Heavy-Hitters Open

David P. Woodruff, Samson Zhou · 2024

Computer science Mathematics

In the adversarial streaming model, the input is a sequence of adaptive updates that defines an underlying dataset and the goal is to approximate, collect, or compute some statistic while using space sublinear in the size of the dataset. I…

On Approximability of $\ell_2^2$ Min-Sum Clustering Open

C. S. Karthik, Euiwoong Lee, Yuval Rabani, Chris Schwiegelshohn, Samson Zhou · 2024

Mathematics Computer science

The $\ell_2^2$ min-sum $k$-clustering problem is to partition an input set into clusters $C_1,\ldots,C_k$ to minimize $\sum_{i=1}^k\sum_{p,q\in C_i}\|p-q\|_2^2$. Although $\ell_2^2$ min-sum $k$-clustering is NP-hard, it is not known whethe…

A Strong Separation for Adversarially Robust $\ell_0$ Estimation for Linear Sketches Open

Elena Gribelyuk, Honghao Lin, David P. Woodruff, Haocun Yu, Samson Zhou · 2024

Mathematics Computer science Economics

The majority of streaming problems are defined and analyzed in a static setting, where the data stream is any worst-case sequence of insertions and deletions that is fixed in advance. However, many real-world applications require a more fl…

Fair Submodular Cover Open

Wenjing Chen, Shuo Xing, Samson Zhou, Victoria G. Crawford · 2024

Computer science Mathematics Engineering

Submodular optimization is a fundamental problem with many applications in machine learning, often involving decision-making over datasets with sensitive attributes such as gender or age. In such settings, it is often desirable to produce …

Streaming Algorithms with Few State Changes Open

Rajesh Jayaram, David P. Woodruff, Samson Zhou · 2024

Computer science

In this paper, we study streaming algorithms that minimize the number of changes made to their internal state (i.e., memory contents). While the design of streaming algorithms typically focuses on minimizing space and update time, these me…

Streaming Algorithms with Few State Changes Open

Rajesh Jayaram, David P. Woodruff, Samson Zhou · 2024

Computer science

In this paper, we study streaming algorithms that minimize the number of changes made to their internal state (i.e., memory contents). While the design of streaming algorithms typically focuses on minimizing space and update time, these me…

Private Vector Mean Estimation in the Shuffle Model: Optimal Rates Require Many Messages Open

Hilal Asi, Vitaly Feldman, Jelani Nelson, Huy L. Nguyên, Samson Zhou , et al. · 2024

Computer science Mathematics Economics

We study the problem of private vector mean estimation in the shuffle model of privacy where $n$ users each have a unit vector $v^{(i)} \in\mathbb{R}^d$. We propose a new multi-message protocol that achieves the optimal error using $\tilde…

Near-Optimal $k$-Clustering in the Sliding Window Model Open

David P. Woodruff, Peilin Zhong, Samson Zhou · 2023

Mathematics Computer science Physics

Clustering is an important technique for identifying structural information in large-scale data analysis, where the underlying dataset may be too large to store. In many applications, recent data can provide more accurate information and t…

Streaming Euclidean $k$-median and $k$-means with $o(\log n)$ Space Open

Vincent Cohen-Addad, David P. Woodruff, Samson Zhou · 2023

Mathematics Physics Computer science

We consider the classic Euclidean $k$-median and $k$-means objective on data streams, where the goal is to provide a $(1+\varepsilon)$-approximation to the optimal $k$-median or $k$-means solution, while using as little memory as possible.…

Differentially Private Aggregation via Imperfect Shuffling Open

Badih Ghazi, Ravi Kumar, Pasin Manurangsi, Jelani Nelson, Samson Zhou · 2023

Computer science Mathematics Engineering

In this paper, we introduce the imperfect shuffle differential privacy model, where messages sent from users are shuffled in an almost uniform manner before being observed by a curator for private aggregation. We then consider the private …

Private Data Stream Analysis for Universal Symmetric Norm Estimation Open

Vladimir Braverman, Joel Manning, Zhiwei Steven Wu, Samson Zhou · 2023

Computer science Mathematics Political science

We study how to release summary statistics on a data stream subject to the constraint of differential privacy. In particular, we focus on releasing the family of symmetric norms, which are invariant under sign-flips and coordinate-wise per…

Fast $(1+\varepsilon)$-Approximation Algorithms for Binary Matrix Factorization Open

Ameya Velingker, Maximilian Vötsch, David P. Woodruff, Samson Zhou · 2023

Mathematics Physics Computer science

We introduce efficient $(1+\varepsilon)$-approximation algorithms for the binary matrix factorization (BMF) problem, where the inputs are a matrix $\mathbf{A}\in\{0,1\}^{n\times d}$, a rank parameter $k>0$, as well as an accuracy parameter…

Robust Algorithms on Adaptive Inputs from Bounded Adversaries Open

Yeshwanth Cherapanamjeri, Sandeep Silwal, David P. Woodruff, Fred Zhang, Qiuyi Zhang , et al. · 2023

Computer science Mathematics Geography

We study dynamic algorithms robust to adaptive input generated from sources with bounded capabilities, such as sparsity or limited interaction. For example, we consider robust linear algebraic algorithms when the updates to the input are s…

Provable Data Subset Selection For Efficient Neural Network Training Open

Murad Tukan, Samson Zhou, Alaa Maalouf, Daniela Rus, Vladimir Braverman , et al. · 2023

Computer science Mathematics Biology

Radial basis function neural networks (\emph{RBFNN}) are {well-known} for their capability to approximate any continuous function on a closed bounded set with arbitrary precision given enough hidden neurons. In this paper, we introduce the…

Streaming Algorithms for Learning with Experts: Deterministic Versus Robust Open

David P. Woodruff, Fred Zhang, Samson Zhou · 2023

Computer science Mathematics Physics

In the online learning with experts problem, an algorithm must make a prediction about an outcome on each of $T$ days (or times), given a set of $n$ experts who make predictions on each day (or time). The algorithm is given feedback on the…

Selective experience replay compression using coresets for lifelong deep reinforcement learning in medical imaging Open

Guangyao Zheng, Samson Zhou, Vladimir Braverman, Michael Jacobs, Vishwa S. Parekh · 2023

Computer science Psychology Materials science

Selective experience replay is a popular strategy for integrating lifelong learning with deep reinforcement learning. Selective experience replay aims to recount selected experiences from previous tasks to avoid catastrophic forgetting. Fu…

Differentially Private $L_2$-Heavy Hitters in the Sliding Window Model Open

Jeremiah Blocki, Seunghoon Lee, Tamalika Mukherjee, Samson Zhou · 2023

Computer science Engineering Political science

The data management of large companies often prioritize more recent data, as a source of higher accuracy prediction than outdated data. For example, the Facebook data policy retains user search histories for $6$ months while the Google dat…

On Differential Privacy and Adaptive Data Analysis with Bounded Space Open

Itai Dinur, Uri Stemmer, David P. Woodruff, Samson Zhou · 2023

Computer science Mathematics Chemistry

We study the space complexity of the two related fields of differential privacy and adaptive data analysis. Specifically, (1) Under standard cryptographic assumptions, we show that there exists a problem P that requires exponentially more …

How to Make Your Approximation Algorithm Private: A Black-Box Differentially-Private Transformation for Tunable Approximation Algorithms of Functions with Low Sensitivity Open

Jeremiah Blocki, Elena Grigorescu, Tamalika Mukherjee, Samson Zhou · 2023

Mathematics Engineering Biology

We develop a framework for efficiently transforming certain approximation algorithms into differentially-private variants, in a black-box manner. Specifically, our results focus on algorithms A that output an approximation to a function f …

Sub-quadratic Algorithms for Kernel Matrices via Kernel Density Estimation Open

Ainesh Bakshi, Piotr Indyk, Praneeth Kacham, Sandeep Silwal, Samson Zhou · 2022

Mathematics Computer science

Kernel matrices, as well as weighted graphs represented by them, are ubiquitous objects in machine learning, statistics and other related fields. The main drawback of using kernel methods (learning and inference using kernel matrices) is e…

Optimal Algorithms for Linear Algebra in the Current Matrix Multiplication Time Open

Yeshwanth Cherapanamjeri, Sandeep Silwal, David P. Woodruff, Samson Zhou · 2022

Mathematics Physics Computer science

We study fundamental problems in linear algebra, such as finding a maximal linearly independent subset of rows or columns (a basis), solving linear regression, or computing a subspace embedding. For these problems, we consider input matric…

Near-Linear Sample Complexity for $L_p$ Polynomial Regression Open

Raphael A. Meyer, Cameron Musco, Christopher Musco, David P. Woodruff, Samson Zhou · 2022

Mathematics Political science Computer science

We study $L_p$ polynomial regression. Given query access to a function $f:[-1,1] \rightarrow \mathbb{R}$, the goal is to find a degree $d$ polynomial $\hat{q}$ such that, for a given parameter $\varepsilon > 0$, $$ \|\hat{q}-f\|_p\le (1+\v…

Learning-Augmented Algorithms for Online Linear and Semidefinite Programming Open

Elena Grigorescu, Young‐San Lin, Sandeep Silwal, Maoyuan Song, Samson Zhou · 2022

Computer science Mathematics Political science

Semidefinite programming (SDP) is a unifying framework that generalizes both linear programming and quadratically-constrained quadratic programming, while also yielding efficient solvers, both in theory and in practice. However, there exis…

Samson Zhou YOU? Author Swipe