Laxman Dhulipala
YOU?
Author Swipe
View article: TD-Orch: Scalable Load-Balancing for Distributed Systems with Applications to Graph Processing
TD-Orch: Scalable Load-Balancing for Distributed Systems with Applications to Graph Processing Open
In this paper, we introduce a task-data orchestration abstraction that supports a range of distributed applications, including graph processing and key-value stores. Given a batch of lambda tasks each requesting one or more data items, whe…
View article: PIM-tree: A Skew-resistant Index for Processing-in-Memory
PIM-tree: A Skew-resistant Index for Processing-in-Memory Open
The performance of today’s in-memory indexes is bottlenecked by the memory latency/bandwidth wall. Processing-in-memory (PIM) is an emerging approach that potentially mitigates this bottleneck by enabling low-latency memory access whose ag…
View article: Efficiently Constructing Sparse Navigable Graphs
Efficiently Constructing Sparse Navigable Graphs Open
Graph-based nearest neighbor search methods have seen a surge of popularity in recent years, offering state-of-the-art performance across a wide variety of applications. Central to these methods is the task of constructing a sparse navigab…
View article: Optimal Batch-Dynamic kd-trees for Processing-in-Memory with Applications
Optimal Batch-Dynamic kd-trees for Processing-in-Memory with Applications Open
View article: Scaling Parallel Algorithms to Massive Datasets using Multi-SSD Machines
Scaling Parallel Algorithms to Massive Datasets using Multi-SSD Machines Open
View article: Fully-Dynamic Parallel Algorithms for Single-Linkage Clustering
Fully-Dynamic Parallel Algorithms for Single-Linkage Clustering Open
View article: Techniques for Practical Parallel BFS and SSSP
Techniques for Practical Parallel BFS and SSSP Open
View article: Fast and Scalable Parallel External-Memory Construction of Colored Compacted de Bruijn Graphs with Cuttlefish 3
Fast and Scalable Parallel External-Memory Construction of Colored Compacted de Bruijn Graphs with Cuttlefish 3 Open
The rapid growth of genomic data over the past decade has made scalable and efficient sequence analysis algorithms, particularly for constructing de Bruijn graphs and their colored and compacted variants critical components of many bioinfo…
View article: DynHAC: Fully Dynamic Approximate Hierarchical Agglomerative Clustering
DynHAC: Fully Dynamic Approximate Hierarchical Agglomerative Clustering Open
We consider the problem of maintaining a hierarchical agglomerative clustering (HAC) in the dynamic setting, when the input is subject to point insertions and deletions. We introduce DynHAC - the first dynamic HAC algorithm for the popular…
View article: Towards Scalable and Practical Batch-Dynamic Connectivity
Towards Scalable and Practical Batch-Dynamic Connectivity Open
We study the problem of dynamically maintaining the connected components of an undirected graph subject to edge insertions and deletions. We give the first parallel algorithm for the problem which is work-efficient, supports batches of upd…
View article: The ParClusterers Benchmark Suite (PCBS): A Fine-Grained Analysis of Scalable Graph Clustering
The ParClusterers Benchmark Suite (PCBS): A Fine-Grained Analysis of Scalable Graph Clustering Open
We introduce the ParClusterers Benchmark Suite (PCBS) -- a collection of highly scalable parallel graph clustering algorithms and benchmarking tools that streamline comparing different graph clustering algorithms and implementations. The b…
View article: Results of the Big ANN: NeurIPS'23 competition
Results of the Big ANN: NeurIPS'23 competition Open
The 2023 Big ANN Challenge, held at NeurIPS 2023, focused on advancing the state-of-the-art in indexing data structures and search algorithms for practical variants of Approximate Nearest Neighbor (ANN) search that reflect the growing comp…
View article: CLIP-Embedded RedCaps Text-Image Dataset
CLIP-Embedded RedCaps Text-Image Dataset Open
This dataset was created by applying the CLIP embedding to the RedCaps dataset. Queries are generated by OpenAI's GPT model simulating textual queries searching multimodal content, embedded via CLIP. The data was curated by Desai, Kaul, Ay…
View article: Efficient Centroid-Linkage Clustering
Efficient Centroid-Linkage Clustering Open
We give an efficient algorithm for Centroid-Linkage Hierarchical Agglomerative Clustering (HAC), which computes a $c$-approximate clustering in roughly $n^{1+O(1/c^2)}$ time. We obtain our result by combining a new Centroid-Linkage HAC alg…
View article: Optimal Parallel Algorithms for Dendrogram Computation and Single-Linkage Clustering
Optimal Parallel Algorithms for Dendrogram Computation and Single-Linkage Clustering Open
View article: MUVERA: Multi-Vector Retrieval via Fixed Dimensional Encodings
MUVERA: Multi-Vector Retrieval via Fixed Dimensional Encodings Open
Neural embedding models have become a fundamental component of modern information retrieval (IR) pipelines. These models produce a single embedding $x \in \mathbb{R}^d$ per data-point, allowing for fast retrieval via highly optimized maxim…
View article: BYO: A Unified Framework for Benchmarking Large-Scale Graph Containers
BYO: A Unified Framework for Benchmarking Large-Scale Graph Containers Open
A fundamental building block in any graph algorithm is a graph container - a data structure used to represent the graph. Ideally, a graph container enables efficient access to the underlying graph, has low space usage, and supports updatin…
View article: Optimal Parallel Algorithms for Dendrogram Computation and Single-Linkage Clustering
Optimal Parallel Algorithms for Dendrogram Computation and Single-Linkage Clustering Open
Computing a Single-Linkage Dendrogram (SLD) is a key step in the classic single-linkage hierarchical clustering algorithm. Given an input edge-weighted tree $T$, the SLD of $T$ is a binary dendrogram that summarizes the $n-1$ clusterings o…
View article: Fast, parallel, and cache-friendly suffix array construction
Fast, parallel, and cache-friendly suffix array construction Open
Purpose String indexes such as the suffix array ( sa ) and the closely related longest common prefix ( lcp ) array are fundamental objects in bioinformatics and have a wide variety of applications. Despite their importance in practice, few…
View article: Parallel Algorithms for Hierarchical Nucleus Decomposition
Parallel Algorithms for Hierarchical Nucleus Decomposition Open
Nucleus decompositions have been shown to be a useful tool for finding dense subgraphs. The coreness value of a clique represents its density based on the number of other cliques it is adjacent to. One useful output of nucleus decompositio…
View article: Fine-Grained Privacy Guarantees for Coverage Problems
Fine-Grained Privacy Guarantees for Coverage Problems Open
We introduce a new notion of neighboring databases for coverage problems such as Max Cover and Set Cover under differential privacy. In contrast to the standard privacy notion for these problems, which is analogous to node-privacy in graph…
View article: Unleashing Graph Partitioning for Large-Scale Nearest Neighbor Search
Unleashing Graph Partitioning for Large-Scale Nearest Neighbor Search Open
We consider the fundamental problem of decomposing a large-scale approximate nearest neighbor search (ANNS) problem into smaller sub-problems. The goal is to partition the input points into neighborhood-preserving shards, so that the neare…
View article: ParlayANN: Scalable and Deterministic Parallel Graph-Based Approximate Nearest Neighbor Search Algorithms
ParlayANN: Scalable and Deterministic Parallel Graph-Based Approximate Nearest Neighbor Search Algorithms Open
Approximate nearest-neighbor search (ANNS) algorithms are a key part of the modern deep learning stack due to enabling efficient similarity search over high-dimensional vector space representations (i.e., embeddings) of data. Among various…
View article: Parallel Integer Sort: Theory and Practice
Parallel Integer Sort: Theory and Practice Open
Integer sorting is a fundamental problem in computer science. This paper studies parallel integer sort both in theory and in practice. In theory, we show tighter bounds for a class of existing practical integer sort algorithms, which provi…
View article: It’s Hard to HAC Average Linkage!
It’s Hard to HAC Average Linkage! Open
Average linkage Hierarchical Agglomerative Clustering (HAC) is an extensively studied and applied method for hierarchical clustering. Recent applications to massive datasets have driven significant interest in near-linear-time and efficien…
View article: Parallel Set Cover and Hypergraph Matching via Uniform Random Sampling
Parallel Set Cover and Hypergraph Matching via Uniform Random Sampling Open
The SetCover problem has been extensively studied in many different models of computation, including parallel and distributed settings. From an approximation point of view, there are two standard guarantees: an O(log Δ)-approximation (wher…
View article: Parallel Integer Sort: Theory and Practice
Parallel Integer Sort: Theory and Practice Open
Integer sorting is a fundamental problem in computer science. This paper studies parallel integer sort both in theory and in practice. In theory, we show tighter bounds for a class of existing practical integer sort algorithms, which provi…
View article: Near-Optimal Differentially Private k-Core Decomposition
Near-Optimal Differentially Private k-Core Decomposition Open
Recent work by Dhulipala et al. \cite{DLRSSY22} initiated the study of the $k$-core decomposition problem under differential privacy via a connection between low round/depth distributed/parallel graph algorithms and private algorithms with…
View article: TeraHAC: Hierarchical Agglomerative Clustering of Trillion-Edge Graphs
TeraHAC: Hierarchical Agglomerative Clustering of Trillion-Edge Graphs Open
We introduce TeraHAC, a (1+ε)-approximate hierarchical agglomerative clustering (HAC) algorithm which scales to trillion-edge graphs. Our algorithm is based on a new approach to computing (1+ε)-approximate HAC, which is a novel combination…
View article: Practical Parallel Algorithms for Near-Optimal Densest Subgraphs on Massive Graphs
Practical Parallel Algorithms for Near-Optimal Densest Subgraphs on Massive Graphs Open
The densest subgraph problem has received significant attention, both in theory and in practice, due to its applications in problems such as community detection, social network analysis, and spam detection. Due to the high cost of obtainin…