David Saulpic
YOU?
Author Swipe
Estimating the Electoral Consequences of Legislative Redistricting in France Open
International audience
Differentially Private Federated $k$-Means Clustering with Server-Side Data Open
Clustering is a cornerstone of data analysis that is particularly suited to identifying coherent subgroups or substructures in unlabeled data, as are generated continuously in large amounts these days. However, in many cases traditional cl…
A Tight VC-Dimension Analysis of Clustering Coresets with Applications Open
We consider coresets for $k$-clustering problems, where the goal is to assign points to centers minimizing powers of distances. A popular example is the $k$-median objective $\sum_{p}\min_{c\in C}dist(p,C)$. Given a point set $P$, a corese…
Sensitivity Sampling for $k$-Means: Worst Case and Stability Optimal Coreset Bounds Open
International audience
Making Old Things New: A Unified Algorithm for Differentially Private Clustering Open
As a staple of data analysis and unsupervised learning, the problem of private clustering has been widely studied under various privacy models. Centralized differential privacy is the first of them, and the problem has also been studied fo…
Settling Time vs. Accuracy Tradeoffs for Clustering Big Data Open
We study the theoretical and practical runtime limits of k-means and k-median clustering on large datasets. Since effectively all clustering methods are slower than the time it takes to read the dataset, the fastest approach is to quickly …
Sensitivity Sampling for $k$-Means: Worst Case and Stability Optimal Coreset Bounds Open
Coresets are arguably the most popular compression paradigm for center-based clustering objectives such as $k$-means. Given a point set $P$, a coreset $Ω$ is a small, weighted summary that preserves the cost of all candidate solutions $S$ …
Settling Time vs. Accuracy Tradeoffs for Clustering Big Data Open
We study the theoretical and practical runtime limits of k-means and k-median clustering on large datasets. Since effectively all clustering methods are slower than the time it takes to read the dataset, the fastest approach is to quickly …
Data-Efficient Learning via Clustering-Based Sensitivity Sampling: Foundation Models and Beyond Open
We study the data selection problem, whose aim is to select a small representative subset of data that can be used to efficiently train a machine learning model. We present a new data selection approach based on $k$-means clustering and se…
Experimental Evaluation of Fully Dynamic <i>k</i>-Means via Coresets Open
International audience
Experimental Evaluation of Fully Dynamic k-Means via Coresets Open
For a set of points in $\mathbb{R}^d$, the Euclidean $k$-means problems consists of finding $k$ centers such that the sum of distances squared from each data point to its closest center is minimized. Coresets are one the main tools develop…
Deterministic Clustering in High Dimensional Spaces: Sketches and Approximation Open
In all state-of-the-art sketching and coreset techniques for clustering, as well as in the best known fixed-parameter tractable approximation algorithms, randomness plays a key role. For the classic $k$-median and $k$-means problems, there…
Differential Privacy for Clustering Under Continual Observation Open
We consider the problem of clustering privately a dataset in $\mathbb{R}^d$ that undergoes both insertion and deletion of points. Specifically, we give an $\varepsilon$-differentially private clustering mechanism for the $k$-means objectiv…
Scalable Differentially Private Clustering via Hierarchically Separated Trees Open
We study the private k-median and k-means clustering problem in d dimensional Euclidean space. By leveraging tree embeddings, we give an efficient and easy to implement algorithm, that is empirically competitive with state of the art non p…
Community Recovery in the Degree-Heterogeneous Stochastic Block Model Open
International audience
Scalable Differentially Private Clustering via Hierarchically Separated Trees Open
We study the private $k$-median and $k$-means clustering problem in $d$ dimensional Euclidean space. By leveraging tree embeddings, we give an efficient and easy to implement algorithm, that is empirically competitive with state of the art…
Towards optimal lower bounds for k-median and k-means coresets Open
The (k,z)-clustering problem consists of finding a set of k points called centers, such that the sum of distances raised to the power of z of every data point to its closest center is minimized. Among the most commonly encountered special …
Towards Optimal Lower Bounds for k-median and k-means Coresets Open
Given a set of points in a metric space, the $(k,z)$-clustering problem consists of finding a set of $k$ points called centers, such that the sum of distances raised to the power of $z$ of every data point to its closest center is minimize…
An Improved Local Search Algorithm for <i>k</i>-Median Open
In this work, we study k-min-sum-of-radii (k-MSR) clustering under mergeable constraints. k-MSR seeks to group data points using a set of up to k balls, such that the sum of the radii of the balls is minimized. A clustering constraint is c…
An Improved Local Search Algorithm for k-Median Open
We present a new local-search algorithm for the $k$-median clustering problem. We show that local optima for this algorithm give a $(2.836+ε)$-approximation; our result improves upon the $(3+ε)$-approximate local-search algorithm of Arya e…
Near-linear Time Approximation Schemes for Clustering in Doubling Metrics Open
We consider the classic Facility Location, k -Median, and k -Means problems in metric spaces of doubling dimension d . We give nearly linear-time approximation schemes for each problem. The complexity of our algorithms is Õ(2 (1/ε) O(d2) n…
A new coreset framework for clustering Open
Given a metric space, the $(k,z)$-clustering problem consists of finding $k$ centers such that the sum of the of distances raised to the power $z$ of every point to its closest center is minimized. This encapsulates the famous $k$-median (…
On the Power of Louvain in the Stochastic Block Model Open
A classic problem in machine learning and data analysis is to partition the vertices of a network in such a way that vertices in the same set are densely connected and vertices in different sets are loosely connected. In practice, the most…
Polynomial Time Approximation Schemes for Clustering in Low Highway\n Dimension Graphs Open
We study clustering problems such as k-Median, k-Means, and Facility Location\nin graphs of low highway dimension, which is a graph parameter modeling\ntransportation networks. It was previously shown that approximation schemes for\nthese …
Dominating Sets and Connected Dominating Sets in Dynamic Graphs Open
In this paper we study the dynamic versions of two basic graph problems: Minimum Dominating Set and its variant Minimum Connected Dominating Set. For those two problems, we present algorithms that maintain a solution under edge insertions …
View article: Polynomial-Time Approximation Schemes for k-center, k-median, and Capacitated Vehicle Routing in Bounded Highway Dimension
Polynomial-Time Approximation Schemes for k-center, k-median, and Capacitated Vehicle Routing in Bounded Highway Dimension Open
The concept of bounded highway dimension was developed to capture observed properties of road networks. We show that a graph of bounded highway dimension with a distinguished root vertex can be embedded into a graph of bounded treewidth in…
Generating Functionally Equivalent Programs Having Non-Isomorphic Control-Flow Graphs Open
One of the big challenges in program obfuscation consists in modifying not only the program's straight-line code (SLC) but also the program's control flow graph (CFG). Indeed, if only SLC is modified, the program's CFG can be extracted and…
View article: Polynomial-Time Approximation Schemes for k-Center and Bounded-Capacity Vehicle Routing in Metrics with Bounded Highway Dimension.
Polynomial-Time Approximation Schemes for k-Center and Bounded-Capacity Vehicle Routing in Metrics with Bounded Highway Dimension. Open
The concept of bounded highway dimension was developed to capture observed properties of the metrics of road networks. We show that a graph with bounded highway dimension, for any vertex, can be embedded into a a graph of bounded treewidth…