Gur-Eyal Sela
YOU?
Author Swipe
View article: Towards Efficient and Practical GPU Multitasking in the Era of LLM
Towards Efficient and Practical GPU Multitasking in the Era of LLM Open
GPU singletasking is becoming increasingly inefficient and unsustainable as hardware capabilities grow and workloads diversify. We are now at an inflection point where GPUs must embrace multitasking, much like CPUs did decades ago, to meet…
View article: Context-Aware Streaming Perception in Dynamic Environments
Context-Aware Streaming Perception in Dynamic Environments Open
Efficient vision works maximize accuracy under a latency budget. These works evaluate accuracy offline, one image at a time. However, real-time vision applications like autonomous driving operate in streaming settings, where ground truth c…
View article: Online Learning Demands in Max-min Fairness
Online Learning Demands in Max-min Fairness Open
We describe mechanisms for the allocation of a scarce resource among multiple users in a way that is efficient, fair, and strategy-proof, but when users do not know their resource requirements. The mechanism is repeated for multiple rounds…
View article: InferLine
InferLine Open
Serving ML prediction pipelines spanning multiple models and hardware accelerators is a key challenge in production machine learning. Optimally configuring these pipelines to meet tight end-to-end latency goals is complicated by the intera…
View article: InferLine: ML Inference Pipeline Composition Framework.
InferLine: ML Inference Pipeline Composition Framework. Open
View article: InferLine: ML Prediction Pipeline Provisioning and Management for Tight Latency Objectives
InferLine: ML Prediction Pipeline Provisioning and Management for Tight Latency Objectives Open
Serving ML prediction pipelines spanning multiple models and hardware accelerators is a key challenge in production machine learning. Optimally configuring these pipelines to meet tight end-to-end latency goals is complicated by the intera…
View article: Supercloud
Supercloud Open
Infrastructure-as-a-Service (IaaS) cloud providers hide available interfaces for virtual machine (VM) placement and migration, CPU capping, memory ballooning, page sharing, and I/O throttling, limiting the ways in which applications can op…
View article: Follow the Sun through the Clouds
Follow the Sun through the Clouds Open
Global cloud services have to respond to workloads that shift geographically as a function of time-of-day or in response to special events. While many such services have support for adding nodes in one region and removing nodes in another,…