Sameh Elnikety
YOU?
Author Swipe
View article: Ascendra: Dynamic Request Prioritization for Efficient LLM Serving
Ascendra: Dynamic Request Prioritization for Efficient LLM Serving Open
The rapid advancement of Large Language Models (LLMs) has driven the need for more efficient serving strategies. In this context, efficiency refers to the proportion of requests that meet their Service Level Objectives (SLOs), particularly…
View article: Junctiond: Extending FaaS Runtimes with Kernel-Bypass
Junctiond: Extending FaaS Runtimes with Kernel-Bypass Open
This report explores the use of kernel-bypass networking in FaaS runtimes and demonstrates how using Junction, a novel kernel-bypass system, as the backend for executing components in faasd can enhance performance and isolation. Junction a…
View article: Analytically-Driven Resource Management for Cloud-Native Microservices
Analytically-Driven Resource Management for Cloud-Native Microservices Open
Resource management for cloud-native microservices has attracted a lot of recent attention. Previous work has shown that machine learning (ML)-driven approaches outperform traditional techniques, such as autoscaling, in terms of both SLA m…
View article: WISEFUSE
WISEFUSE Open
We characterize production workloads of serverless DAGs at a major cloud provider. Our analysis highlights two major factors that limit performance: (a) lack of efficient communication methods between the serverless functions in the DAG, a…
View article: Parslo
Parslo Open
Modern cloud services are implemented as graphs of loosely-coupled microservices to improve programmability, reliability, and scalability. Service Level Objectives (SLOs) define end-to-end latency targets for the entire service to ensure u…
View article: Parallel Discovery of Trajectory Companion Pattern and System Evaluation
Parallel Discovery of Trajectory Companion Pattern and System Evaluation Open
Trajectories consist of spatial information of moving objects. Over contious time spans, trajectory data form data streams constantly generated from diverse and geographically distributed sources. Discovery of traveling patterns on traject…
View article: PerfIso: performance isolation for commercial latency-sensitive services
PerfIso: performance isolation for commercial latency-sensitive services Open
Large commercial latency-sensitive services, such as web search, run on dedicated clusters provisioned for peak load to ensure responsiveness and tolerate data center outages. As a result, the average load is far lower than the peak load u…
View article: Swayam
Swayam Open
Developers use Machine Learning (ML) platforms to train ML models and then deploy these ML models as web services for inference (prediction). A key challenge for platform providers is to guarantee response-time Service Level Agreements (SL…
View article: Exploiting heterogeneity for tail latency and energy efficiency
Exploiting heterogeneity for tail latency and energy efficiency Open
Interactive service providers have strict requirements on high-percentile (tail) latency to meet user expectations. If providers meet tail latency targets with less energy, they increase profits, because energy is a significant operating e…
View article: BitFunnel
BitFunnel Open
Since the mid-90s there has been a widely-held belief that signature files are inferior to inverted files for text indexing. In recent years the Bing search engine has developed and deployed an index based on bit-sliced signatures. This in…
View article: Optimal Reissue Policies for Reducing Tail Latency
Optimal Reissue Policies for Reducing Tail Latency Open
Interactive services send redundant requests to multiple different replicas to meet stringent tail latency requirements. These addi- tional (reissue) requests mitigate the impact of non-deterministic delays within the system and thus incre…
View article: Obtaining and Managing Answer Quality for Online Data-Intensive Services
Obtaining and Managing Answer Quality for Online Data-Intensive Services Open
Online data-intensive (OLDI) services use anytime algorithms to compute over large amounts of data and respond quickly. Interactive response times are a priority, so OLDI services parallelize query execution across distributed software com…
View article: GeoTrend
GeoTrend Open
This paper presents GeoTrend; a system for scalable support of spatial trend discovery on recent microblogs, e.g., tweets and online reviews, that come in real time. GeoTrend is distinguished from existing techniques in three aspects: (1) …
View article: Work stealing for interactive services to meet target latency
Work stealing for interactive services to meet target latency Open
Interactive web services increasingly drive critical business workloads such as search, advertising, games, shopping, and finance. Whereas optimizing parallel programs and distributed server systems have historically focused on average lat…
View article: Work stealing for interactive services to meet target latency
Work stealing for interactive services to meet target latency Open
Interactive web services increasingly drive critical business workloads such as search, advertising, games, shopping, and finance. Whereas optimizing parallel programs and distributed server systems have historically focused on average lat…
View article: Measuring and Managing Answer Quality for Online Data-Intensive Services
Measuring and Managing Answer Quality for Online Data-Intensive Services Open
Online data-intensive services parallelize query execution across distributed software components. Interactive response time is a priority, so online query executions return answers without waiting for slow running components to finish. Ho…