Explanipedia

Towards Efficient and Practical GPU Multitasking in the Era of LLM Open

Jiarong Xing, Yifan Qiao, Xiao-Bing Cui, Gur-Eyal Sela, Ion Stoica · 2025

GPU singletasking is becoming increasingly inefficient and unsustainable as hardware capabilities grow and workloads diversify. We are now at an inflection point where GPUs must embrace multitasking, much like CPUs did decades ago, to meet…

Context-Aware Streaming Perception in Dynamic Environments Open

Gur-Eyal Sela, Ionel Gog, Justin Wong, Kumar Krishna Agrawal, Xiangxi Mo , et al. · 2022

Efficient vision works maximize accuracy under a latency budget. These works evaluate accuracy offline, one image at a time. However, real-time vision applications like autonomous driving operate in streaming settings, where ground truth c…

Online Learning Demands in Max-min Fairness Open

Kirthevasan Kandasamy, Gur-Eyal Sela, Joseph E. Gonzalez, Michael I. Jordan, Ion Stoica · 2020

We describe mechanisms for the allocation of a scarce resource among multiple users in a way that is efficient, fair, and strategy-proof, but when users do not know their resource requirements. The mechanism is repeated for multiple rounds…

InferLine Open

Daniel Crankshaw, Gur-Eyal Sela, Xiangxi Mo, Corey Zumar, Ion Stoica , et al. · 2020

Serving ML prediction pipelines spanning multiple models and hardware accelerators is a key challenge in production machine learning. Optimally configuring these pipelines to meet tight end-to-end latency goals is complicated by the intera…

InferLine: ML Inference Pipeline Composition Framework. Open

Daniel Crankshaw, Gur-Eyal Sela, Corey Zumar, Xiangxi Mo, Joseph E. Gonzalez , et al. · 2018

InferLine: ML Prediction Pipeline Provisioning and Management for Tight Latency Objectives Open

Daniel Crankshaw, Gur-Eyal Sela, Corey Zumar, Xiangxi Mo, Joseph E. Gonzalez , et al. · 2018

Serving ML prediction pipelines spanning multiple models and hardware accelerators is a key challenge in production machine learning. Optimally configuring these pipelines to meet tight end-to-end latency goals is complicated by the intera…

Supercloud Open

Zhiming Shen, Qin Jia, Gur-Eyal Sela, Weijia Song, Hakim Weatherspoon , et al. · 2017

Infrastructure-as-a-Service (IaaS) cloud providers hide available interfaces for virtual machine (VM) placement and migration, CPU capping, memory ballooning, page sharing, and I/O throttling, limiting the ways in which applications can op…

Follow the Sun through the Clouds Open

Zhiming Shen, Qin Jia, Gur-Eyal Sela, Ben Rainero, Weijia Song , et al. · 2016

Global cloud services have to respond to workloads that shift geographically as a function of time-of-day or in response to special events. While many such services have support for adding nodes in one region and removing nodes in another,…

Gur-Eyal Sela YOU? Author Swipe