Explanipedia

Efficient LLM Serving on Hybrid Real-time and Best-effort Requests Open

Wan Borui, Juntao Zhao, Jiang Chenyu, Chuanxiong Guo, Chuansong Wu · 2025

Recent breakthroughs in large Language Models (LLMs) have enabled various generative tasks on a single model. Real-world services (e.g., OpenAI's ChatGPT [27]) powered by an LLM often concurrently support latency-critical requests for inte…

Collie: Finding Performance Anomalies in RDMA Subsystems Open

Xinhao Kong, Yibo Zhu, Huaping Zhou, Zhuo Jiang, Jianxi Ye , et al. · 2023

Computer science

High-speed RDMA networks are getting rapidly adopted in the industry for their low latency and reduced CPU overheads. To verify that RDMA can be used in production, system administrators need to understand the set of application workloads …

dPRO: A Generic Profiling and Optimization System for Expediting Distributed DNN Training Open

Hanpeng Hu, Chenyu Jiang, Yuchen Zhong, Yanghua Peng, Chuan Wu , et al. · 2022

Computer science Psychology Economics

Distributed training using multiple devices (e.g., GPUs) has been widely adopted for learning DNN models over large datasets. However, the performance of large-scale distributed training tends to be far from linear speed-up in practice. Gi…

Aryl: An Elastic Cluster Scheduler for Deep Learning Open

Jiamin Li, Hong Xu, Yibo Zhu, Zherui Liu, Chuanxiong Guo , et al. · 2022

Computer science Mathematics

Companies build separate training and inference GPU clusters for deep learning, and use separate schedulers to manage them. This leads to problems for both training and inference: inference clusters have low GPU utilization when the traffi…

Prediction of GPU Failures Under Deep Learning Workloads Open

Heting Liu, Zhichao Li, Cheng Tan, Rongqiu Yang, Guohong Cao , et al. · 2022

Computer science Physics

Graphics processing units (GPUs) are the de facto standard for processing deep learning (DL) tasks. Meanwhile, GPU failures, which are inevitable, cause severe consequences in DL tasks: they disrupt distributed trainings, crash inference s…

BGL: GPU-Efficient GNN Training by Optimizing Graph Data I/O and Preprocessing Open

Tianfeng Liu, Yangrui Chen, Dan Li, Chuan Wu, Yibo Zhu , et al. · 2021

Computer science Mathematics

Graph neural networks (GNNs) have extended the success of deep neural networks (DNNs) to non-Euclidean graph data, achieving ground-breaking performance on various tasks such as node classification and graph property prediction. Nonetheles…

Serving DNN Models with Multi-Instance GPUs: A Case of the Reconfigurable Machine Scheduling Problem Open

Cheng Tan, Zhichao Li, Jian Zhang, Yu Cao, Sikai Qi , et al. · 2021

Computer science Mathematics

Multi-Instance GPU (MIG) is a new feature introduced by NVIDIA A100 GPUs that partitions one physical GPU into multiple GPU instances. With MIG, A100 can be the most cost-efficient GPU ever for serving Deep Neural Networks (DNNs). However,…

AutoLRS: Automatic Learning-Rate Schedule by Bayesian Optimization on the Fly Open

Yuchen Jin, Tianyi Zhou, Liangyu Zhao, Yibo Zhu, Chuanxiong Guo , et al. · 2021

Computer science Psychology

Gradient compression is a widely-established remedy to tackle the communication bottleneck in distributed training of large deep neural networks (DNNs). Under the error-feedback framework, Top-k sparsification, sometimes with k as little a…

AutoLRS: Automatic Learning-Rate Schedule by Bayesian Optimization on the Fly Open

Yuchen Jin, Tianyi Zhou, Liangyu Zhao, Yibo Zhu, Chuanxiong Guo , et al. · 2021

Computer science Psychology

The learning rate (LR) schedule is one of the most important hyper-parameters needing careful tuning in training DNNs. However, it is also one of the least automated parts of machine learning systems and usually costs significant manual ef…

Tersecades: efficient data compression in stream processing Open

Gennady Pekhimenko, Chuanxiong Guo, Myeongjae Jeon, Peng Huang, Lidong Zhou · 2018

Computer science Economics Materials science

This work is the first systematic investigation of stream processing with data compression: we have not only identified a set of factors that influence the benefits and overheads of compression, but have also demonstrated that compression …

Chuanxiong Guo YOU? Author Swipe