Explanipedia

Estimating the Electoral Consequences of Legislative Redistricting in France Open

Evripidis Bampis, Thomas Ehrhard, Bruno Escoffier, Claire Mathieu, Fanny Pascual , et al. · 2025

Computer science Political science

International audience

Differentially Private Federated $k$-Means Clustering with Server-Side Data Open

Jonathan Scott, Christoph H. Lampert, David Saulpic · 2025

Clustering is a cornerstone of data analysis that is particularly suited to identifying coherent subgroups or substructures in unlabeled data, as are generated continuously in large amounts these days. However, in many cases traditional cl…

A Tight VC-Dimension Analysis of Clustering Coresets with Applications Open

Vincent Cohen-Addad, Andrew Draganov, Matteo Russo, David Saulpic, Chris Schwiegelshohn · 2025

Computer science Mathematics

We consider coresets for $k$-clustering problems, where the goal is to assign points to centers minimizing powers of distances. A popular example is the $k$-median objective $\sum_{p}\min_{c\in C}dist(p,C)$. Given a point set $P$, a corese…

Sensitivity Sampling for $k$-Means: Worst Case and Stability Optimal Coreset Bounds Open

Nikhil Bansal, Vincent Cohen-Addad, Milind Prabhu, David Saulpic, Chris Schwiegelshohn · 2024

Computer science Mathematics Engineering

International audience

Making Old Things New: A Unified Algorithm for Differentially Private Clustering Open

Max Dupré la Tour, Monika Henzinger, David Saulpic · 2024

Computer science

As a staple of data analysis and unsupervised learning, the problem of private clustering has been widely studied under various privacy models. Centralized differential privacy is the first of them, and the problem has also been studied fo…

Settling Time vs. Accuracy Tradeoffs for Clustering Big Data Open

Andrew Draganov, David Saulpic, Chris Schwiegelshohn · 2024

Computer science Engineering

We study the theoretical and practical runtime limits of k-means and k-median clustering on large datasets. Since effectively all clustering methods are slower than the time it takes to read the dataset, the fastest approach is to quickly …

Sensitivity Sampling for $k$-Means: Worst Case and Stability Optimal Coreset Bounds Open

Nikhil Bansal, Vincent Cohen-Addad, Milind Prabhu, David Saulpic, Chris Schwiegelshohn · 2024

Mathematics Computer science Engineering

Coresets are arguably the most popular compression paradigm for center-based clustering objectives such as $k$-means. Given a point set $P$, a coreset $Ω$ is a small, weighted summary that preserves the cost of all candidate solutions $S$ …

Settling Time vs. Accuracy Tradeoffs for Clustering Big Data Open

Andrew Draganov, David Saulpic, Chris Schwiegelshohn · 2024

Computer science Environmental science

We study the theoretical and practical runtime limits of k-means and k-median clustering on large datasets. Since effectively all clustering methods are slower than the time it takes to read the dataset, the fastest approach is to quickly …

Data-Efficient Learning via Clustering-Based Sensitivity Sampling: Foundation Models and Beyond Open

Kyriakos Axiotis, Vincent Cohen-Addad, Monika Henzinger, Sammy Jerome, Vahab Mirrokni , et al. · 2024

Computer science Geography Engineering

We study the data selection problem, whose aim is to select a small representative subset of data that can be used to efficiently train a machine learning model. We present a new data selection approach based on $k$-means clustering and se…

Experimental Evaluation of Fully Dynamic <i>k</i>-Means via Coresets Open

Monika Henzinger, David Saulpic, Leonhard Sidl · 2024

Computer science Mathematics Philosophy

International audience

Experimental Evaluation of Fully Dynamic k-Means via Coresets Open

Monika Henzinger, David Saulpic, Sidl, Leonhard · 2023

Computer science Mathematics Biology

For a set of points in $\mathbb{R}^d$, the Euclidean $k$-means problems consists of finding $k$ centers such that the sum of distances squared from each data point to its closest center is minimized. Coresets are one the main tools develop…

Deterministic Clustering in High Dimensional Spaces: Sketches and Approximation Open

Vincent Cohen-Addad, David Saulpic, Chris Schwiegelshohn · 2023

Computer science Mathematics Physics

In all state-of-the-art sketching and coreset techniques for clustering, as well as in the best known fixed-parameter tractable approximation algorithms, randomness plays a key role. For the classic $k$-median and $k$-means problems, there…

Differential Privacy for Clustering Under Continual Observation Open

Max Dupré la Tour, Monika Henzinger, David Saulpic · 2023

Mathematics Computer science

We consider the problem of clustering privately a dataset in $\mathbb{R}^d$ that undergoes both insertion and deletion of points. Specifically, we give an $\varepsilon$-differentially private clustering mechanism for the $k$-means objectiv…

Scalable Differentially Private Clustering via Hierarchically Separated Trees Open

Vincent Cohen-Addad, Alessandro Epasto, Silvio Lattanzi, Vahab Mirrokni, Andrés Muñoz Medina , et al. · 2022

Computer science Mathematics Chemistry

We study the private k-median and k-means clustering problem in d dimensional Euclidean space. By leveraging tree embeddings, we give an efficient and easy to implement algorithm, that is empirically competitive with state of the art non p…

Community Recovery in the Degree-Heterogeneous Stochastic Block Model Open

Vincent Cohen-Addad, Frederik Mallmann-Trenn, David Saulpic · 2022

Computer science Mathematics Physics

International audience

Scalable Differentially Private Clustering via Hierarchically Separated Trees Open

Vincent Cohen-Addad, Alessandro Epasto, Silvio Lattanzi, Vahab Mirrokni, Andrés Muñoz , et al. · 2022

Computer science Mathematics Chemistry

We study the private $k$-median and $k$-means clustering problem in $d$ dimensional Euclidean space. By leveraging tree embeddings, we give an efficient and easy to implement algorithm, that is empirically competitive with state of the art…

Towards optimal lower bounds for k-median and k-means coresets Open

Vincent Cohen-Addad, Kasper Green Larsen, David Saulpic, Chris Schwiegelshohn · 2022

Computer science Mathematics

The (k,z)-clustering problem consists of finding a set of k points called centers, such that the sum of distances raised to the power of z of every data point to its closest center is minimized. Among the most commonly encountered special …

Towards Optimal Lower Bounds for k-median and k-means Coresets Open

Vincent Cohen-Addad, Kasper Green Larsen, David Saulpic, Chris Schwiegelshohn · 2022

Mathematics Physics Chemistry

Given a set of points in a metric space, the $(k,z)$-clustering problem consists of finding a set of $k$ points called centers, such that the sum of distances raised to the power of $z$ of every data point to its closest center is minimize…

An Improved Local Search Algorithm for <i>k</i>-Median Open

Vincent Cohen-Addad, Anupam Gupta, Lunjia Hu, Hoon Oh, David Saulpic · 2022

Mathematics Computer science Biology

In this work, we study k-min-sum-of-radii (k-MSR) clustering under mergeable constraints. k-MSR seeks to group data points using a set of up to k balls, such that the sum of the radii of the balls is minimized. A clustering constraint is c…

An Improved Local Search Algorithm for k-Median Open

Vincent Cohen-Addad, Anupam Gupta, Lunjia Hu, Hoon Oh, David Saulpic · 2021

Computer science Mathematics Biology

We present a new local-search algorithm for the $k$-median clustering problem. We show that local optima for this algorithm give a $(2.836+ε)$-approximation; our result improves upon the $(3+ε)$-approximate local-search algorithm of Arya e…

Near-linear Time Approximation Schemes for Clustering in Doubling Metrics Open

Vincent Cohen-Addad, Andreas Emil Feldmann, David Saulpic · 2021

Mathematics Economics

We consider the classic Facility Location, k -Median, and k -Means problems in metric spaces of doubling dimension d . We give nearly linear-time approximation schemes for each problem. The complexity of our algorithms is Õ(2 (1/ε) O(d2) n…

A new coreset framework for clustering Open

Vincent Cohen-Addad, David Saulpic, Chris Schwiegelshohn · 2021

Computer science Mathematics Engineering

Given a metric space, the $(k,z)$-clustering problem consists of finding $k$ centers such that the sum of the of distances raised to the power $z$ of every point to its closest center is minimized. This encapsulates the famous $k$-median (…

On the Power of Louvain in the Stochastic Block Model Open

Vincent Cohen-Addad, Adrian Kosowski, Frederik Mallmann-Trenn, David Saulpic · 2020

Computer science Mathematics Physics

A classic problem in machine learning and data analysis is to partition the vertices of a network in such a way that vertices in the same set are densely connected and vertices in different sets are loosely connected. In practice, the most…

Polynomial Time Approximation Schemes for Clustering in Low Highway\n Dimension Graphs Open

Andreas Emil Feldmann, David Saulpic · 2020

Computer science Mathematics

We study clustering problems such as k-Median, k-Means, and Facility Location\nin graphs of low highway dimension, which is a graph parameter modeling\ntransportation networks. It was previously shown that approximation schemes for\nthese …

Dominating Sets and Connected Dominating Sets in Dynamic Graphs Open

Niklas Hjuler, Giuseppe F. Italiano, Nikos Parotsidis, David Saulpic · 2019

Mathematics Computer science

In this paper we study the dynamic versions of two basic graph problems: Minimum Dominating Set and its variant Minimum Connected Dominating Set. For those two problems, we present algorithms that maintain a solution under edge insertions …

Polynomial-Time Approximation Schemes for k-center, k-median, and Capacitated Vehicle Routing in Bounded Highway Dimension Open

Amariah Becker, Philip N. Klein, David Saulpic · 2018

Mathematics Computer science

The concept of bounded highway dimension was developed to capture observed properties of road networks. We show that a graph of bounded highway dimension with a distinguished root vertex can be embedded into a graph of bounded treewidth in…

Generating Functionally Equivalent Programs Having Non-Isomorphic Control-Flow Graphs Open

Rémi Géraud, Mirko Koscina, Paul Lenczner, David Naccache, David Saulpic · 2017

Computer science

One of the big challenges in program obfuscation consists in modifying not only the program's straight-line code (SLC) but also the program's control flow graph (CFG). Indeed, if only SLC is modified, the program's CFG can be extracted and…

Polynomial-Time Approximation Schemes for k-Center and Bounded-Capacity Vehicle Routing in Metrics with Bounded Highway Dimension. Open

Amariah Becker, Philip N. Klein, David Saulpic · 2017

Mathematics Computer science

The concept of bounded highway dimension was developed to capture observed properties of the metrics of road networks. We show that a graph with bounded highway dimension, for any vertex, can be embedded into a a graph of bounded treewidth…

David Saulpic YOU? Author Swipe