Khuzaima Daudjee
YOU?
Author Swipe
View article: Zorse: Optimizing LLM Training Efficiency on Heterogeneous GPU Clusters
Zorse: Optimizing LLM Training Efficiency on Heterogeneous GPU Clusters Open
Large language models (LLMs) require vast amounts of GPU compute to train, but limited availability and high costs of GPUs make homogeneous clusters impractical for many organizations. Instead, assembling heterogeneous clusters by pooling …
View article: Cephalo: Harnessing Heterogeneous GPU Clusters for Training Transformer Models
Cephalo: Harnessing Heterogeneous GPU Clusters for Training Transformer Models Open
Training transformer models requires substantial GPU compute and memory resources. In homogeneous clusters, distributed strategies allocate resources evenly, but this approach is inefficient for heterogeneous clusters, where GPUs differ in…
View article: FreeRide: Harvesting Bubbles in Pipeline Parallelism
FreeRide: Harvesting Bubbles in Pipeline Parallelism Open
The occurrence of bubbles in pipeline parallelism is an inherent limitation that can account for more than 40% of the large language model (LLM) training time and is one of the main reasons for the underutilization of GPU resources in LLM …
View article: Practical Hardware Transactional vEB Trees
Practical Hardware Transactional vEB Trees Open
van Emde Boas (vEB) trees are sequential data structures optimized for extremely fast predecessor and successor queries. Such queries are an important incentive to use ordered sets or maps such as vEB trees. All operations in a vEB tree ar…
View article: The future is big graphs
The future is big graphs Open
Ensuring the success of big graph processing for the next decade and beyond.
View article: Klink: Progress-Aware Scheduling for Streaming Data Systems
Klink: Progress-Aware Scheduling for Streaming Data Systems Open
Modern stream processing engines (SPEs) process large volumes of events propagated at high velocity through multiple queries. To improve performance, existing SPEs generally aim to minimize query output latency by minimizing, in turn, the …
View article: The Future is Big Graphs! A Community View on Graph Processing Systems
The Future is Big Graphs! A Community View on Graph Processing Systems Open
Graphs are by nature unifying abstractions that can leverage interconnectedness to represent, explore, predict, and explain real- and digital-world phenomena. Although real users and consumers of graph instances and graph workloads underst…
View article: Providing Serializability for Pregel-like Graph Processing Systems
Providing Serializability for Pregel-like Graph Processing Systems Open
There is considerable interest in the design and development of distributed systems that can execute algorithms to process large graphs. Serializability guarantees that parallel executions of a graph algorithm produce the same results as s…