Explanipedia

Ascendra: Dynamic Request Prioritization for Efficient LLM Serving Open

Azam Ikram, Xiang Li, Sameh Elnikety, Saurabh Bagchi · 2025

The rapid advancement of Large Language Models (LLMs) has driven the need for more efficient serving strategies. In this context, efficiency refers to the proportion of requests that meet their Service Level Objectives (SLOs), particularly…

Junctiond: Extending FaaS Runtimes with Kernel-Bypass Open

Enrique Saurez, Joshua Fried, Gohar Irfan Chaudhry, Esha Choukse, Íñigo Goiri , et al. · 2024

Computer science Mathematics

This report explores the use of kernel-bypass networking in FaaS runtimes and demonstrates how using Junction, a novel kernel-bypass system, as the backend for executing components in faasd can enhance performance and isolation. Junction a…

Analytically-Driven Resource Management for Cloud-Native Microservices Open

Yanqi Zhang, Zhuangzhuang Zhou, Sameh Elnikety, Christina Delimitrou · 2024

Computer science

Resource management for cloud-native microservices has attracted a lot of recent attention. Previous work has shown that machine learning (ML)-driven approaches outperform traditional techniques, such as autoscaling, in terms of both SLA m…

WISEFUSE Open

Ashraf Mahgoub, Edgardo Barsallo Yi, Karthick Shankar, Eshaan Minocha, Sameh Elnikety , et al. · 2022

Computer science

We characterize production workloads of serverless DAGs at a major cloud provider. Our analysis highlights two major factors that limit performance: (a) lack of efficient communication methods between the serverless functions in the DAG, a…

Parslo Open

Amirhossein Mirhosseini, Sameh Elnikety, Thomas F. Wenisch · 2021

Computer science

Modern cloud services are implemented as graphs of loosely-coupled microservices to improve programmability, reliability, and scalability. Service Level Objectives (SLOs) define end-to-end latency targets for the entire service to ensure u…

Parallel Discovery of Trajectory Companion Pattern and System Evaluation Open

Yongyi Xian, Yan Liu, Chuanfei Xu, Sameh Elnikety, Elie Neghawi · 2020

Computer science Business Physics

Trajectories consist of spatial information of moving objects. Over contious time spans, trajectory data form data streams constantly generated from diverse and geographically distributed sources. Discovery of traveling patterns on traject…

PerfIso: performance isolation for commercial latency-sensitive services Open

Călin Iorgulescu, Reza Azimi, Youngjin Kwon, Sameh Elnikety, Manoj Syamala , et al. · 2018

Computer science

Large commercial latency-sensitive services, such as web search, run on dedicated clusters provisioned for peak load to ensure responsiveness and tolerate data center outages. As a result, the average load is far lower than the peak load u…

Swayam Open

Arpan Gujarati, Sameh Elnikety, Yuxiong He, Kathryn S. McKinley, Björn B. Brandenburg · 2017

Computer science Medicine Economics

Developers use Machine Learning (ML) platforms to train ML models and then deploy these ML models as web services for inference (prediction). A key challenge for platform providers is to guarantee response-time Service Level Agreements (SL…

Exploiting heterogeneity for tail latency and energy efficiency Open

Md Enamul Haque, Yuxiong He, Sameh Elnikety, Thu D. Nguyen, Ricardo Bianchini , et al. · 2017

Computer science Engineering Economics

Interactive service providers have strict requirements on high-percentile (tail) latency to meet user expectations. If providers meet tail latency targets with less energy, they increase profits, because energy is a significant operating e…

BitFunnel Open

Bob Goodwin, Michael Hopcroft, Dan Luu, Alex Clemmer, Mihaela Curmei , et al. · 2017

Computer science Mathematics

Since the mid-90s there has been a widely-held belief that signature files are inferior to inverted files for text indexing. In recent years the Bing search engine has developed and deployed an index based on bit-sliced signatures. This in…

Optimal Reissue Policies for Reducing Tail Latency Open

Tim Kaler, Yuxiong He, Sameh Elnikety · 2017

Computer science

Interactive services send redundant requests to multiple different replicas to meet stringent tail latency requirements. These addi- tional (reissue) requests mitigate the impact of non-deterministic delays within the system and thus incre…

Obtaining and Managing Answer Quality for Online Data-Intensive Services Open

Jaimie Kelley, Christopher Stewart, Nathaniel Morris, Devesh Tiwari, Yuxiong He , et al. · 2017

Computer science Biology Philosophy

Online data-intensive (OLDI) services use anytime algorithms to compute over large amounts of data and respond quickly. Interactive response times are a priority, so OLDI services parallelize query execution across distributed software com…

GeoTrend Open

Amr Magdy, Ahmed M. Aly, Mohamed F. Mokbel, Sameh Elnikety, Yuxiong He , et al. · 2016

Computer science Geography

This paper presents GeoTrend; a system for scalable support of spatial trend discovery on recent microblogs, e.g., tweets and online reviews, that come in real time. GeoTrend is distinguished from existing techniques in three aspects: (1) …

Work stealing for interactive services to meet target latency Open

Jing Li, Kunal Agrawal, Sameh Elnikety, Yuxiong He, I-Ting Angelina Lee , et al. · 2016

Computer science Economics

Interactive web services increasingly drive critical business workloads such as search, advertising, games, shopping, and finance. Whereas optimizing parallel programs and distributed server systems have historically focused on average lat…

Work stealing for interactive services to meet target latency Open

Jing Li, Kunal Agrawal, Sameh Elnikety, Yuxiong He, I-Ting Angelina Lee , et al. · 2016

Computer science

Interactive web services increasingly drive critical business workloads such as search, advertising, games, shopping, and finance. Whereas optimizing parallel programs and distributed server systems have historically focused on average lat…

Measuring and Managing Answer Quality for Online Data-Intensive Services Open

Jaimie Kelley, Christopher Stewart, Nathaniel Morris, Devesh Tiwari, Yuxiong He , et al. · 2015

Computer science Philosophy Materials science

Online data-intensive services parallelize query execution across distributed software components. Interactive response time is a priority, so online query executions return answers without waiting for slow running components to finish. Ho…

Sameh Elnikety YOU? Author Swipe