Explanipedia

MoE-CAP: Benchmarking Cost, Accuracy and Performance of Sparse Mixture-of-Experts Systems Open

Yao Fu, Leyang Xue, Man-Kit Sit, Dong Li, Zhixin Miao , et al. · 2025

The sparse Mixture-of-Experts (MoE) architecture is increasingly favored for scaling Large Language Models (LLMs) efficiently, but it depends on heterogeneous compute and memory resources. These factors jointly affect system Cost, Accuracy…

MoE-Gen: High-Throughput MoE Inference on a Single GPU with Module-Based Batching Open

Tairan Xu, Leyang Xue, Zhan Lu, William Henry Jackson, Luo Mai · 2025

This paper presents MoE-Gen, a high-throughput MoE inference system optimized for single-GPU execution. Existing inference systems rely on model-based or continuous batching strategies, originally designed for interactive inference, which …

MotionCanvas: Cinematic Shot Design with Controllable Image-to-Video Generation Open

Jinbo Xing, Luo Mai, Cusuh Ham, Jiahui Huang, Aniruddha Mahapatra , et al. · 2025

Computer science Materials science

This paper presents a method that allows users to design cinematic video shots in the context of image-to-video generation. Shot design, a critical aspect of filmmaking, involves meticulously planning both camera movements and object motio…

Pushing the Boundaries of State Space Models for Image and Video Generation Open

Yicong Hong, Luo Mai, Yuan Yao, Feng Liu · 2025

Computer science

While Transformers have become the dominant architecture for visual generation, linear attention models, such as the state-space models (SSM), are increasingly recognized for their efficiency in processing long visual sequences. However, t…

Progressive Growing of Video Tokenizers for Temporally Compact Latent Spaces Open

Ashoka Mahapatra, Luo Mai, Yitian Zhang, David Bourgin, Feng Liu · 2025

Computer science

Video tokenizers are essential for latent video diffusion models, converting raw video data into spatiotemporally compressed latent spaces for efficient training. However, extending state-of-the-art video tokenizers to achieve a temporal c…

GaussianVideo: Efficient Video Representation via Hierarchical Gaussian Splatting Open

Andrew D. Bond, Jui-Hsien Wang, Luo Mai, Erkut Erdem, Aykut Erdem · 2025

Computer science Political science Physics

Efficient neural representations for dynamic video scenes are critical for applications ranging from video compression to interactive simulations. Yet, existing methods often face challenges related to high memory usage, lengthy training t…

Mycosphere Notes 521–571: A special edition of fungal biodiversity to celebrate Kevin D. Hyde's 70th birthday and his exceptional contributions to Mycology Open

Sinang Hongsanan, Surapong Khuna, Ishara S. Manawasinghe, Saowaluck Tibpromma, KWT Chethana , et al. · 2025

Biology Computer science

This special edition of Mycosphere Notes commemorates the 70th birthday of Kevin D. Hyde, a seminal figure in fungal taxonomy whose work has profoundly influenced the study of fungal diversity and classification. In this paper, we provide …

TAB: Transformer Attention Bottlenecks enable User Intervention and Debugging in Vision-Language Models Open

Pooyan Rahmanzadehgrevi, Hao G. Nguyen, Rosanne Liu, Luo Mai, Anh Nguyen · 2024

Computer science Engineering

Multi-head self-attention (MHSA) is a key component of Transformers, a widely popular architecture in both language and vision. Multiple heads intuitively enable different parallel processes over the same input. Yet, they also obscure the …

MoE-CAP: Benchmarking Cost, Accuracy and Performance of Sparse Mixture-of-Experts Systems Open

Yao Fu, Yu Jiang, Yeqi Huang, Ping Nie, Zhan Lü , et al. · 2024

Computer science Business Engineering

The sparse Mixture-of-Experts (MoE) architecture is increasingly favored for scaling Large Language Models (LLMs) efficiently, but it depends on heterogeneous compute and memory resources. These factors jointly affect system Cost, Accuracy…

ServerlessLLM: Low-Latency Serverless Inference for Large Language Models Open

Yao Fu, Leyang Xue, Yeqi Huang, Andrei-Octavian Brabete, Dmitrii Ustiugov , et al. · 2024

Computer science Philosophy

This paper presents ServerlessLLM, a distributed system designed to support low-latency serverless inference for Large Language Models (LLMs). By harnessing the substantial near-GPU storage and memory capacities of inference servers, Serve…

Tenplex: Dynamic Parallelism for Deep Learning using Parallelizable Tensor Collections Open

Marcel Wagenländer, Li Guo, Bo Zhao, Luo Mai, Peter Pietzuch · 2023

Computer science Mathematics

Deep learning (DL) jobs use multi-dimensional parallelism, i.e. combining data, model, and pipeline parallelism, to use large GPU clusters efficiently. Long-running jobs may experience changes to their GPU allocation: (i) resource elastici…

GEAR: A GPU-Centric Experience Replay System for Large Reinforcement Learning Models Open

Hanjing Wang, Man-Kit Sit, Congjie He, Ying Wen, Weinan Zhang , et al. · 2023

Computer science

This paper introduces a distributed, GPU-centric experience replay system, GEAR, designed to perform scalable reinforcement learning (RL) with large sequence models (such as transformers). With such models, existing systems such as Reverb …

Large Sequence Models for Sequential Decision-Making: A Survey Open

Muning Wen, Runji Lin, Hanjing Wang, Yaodong Yang, Ying Wen , et al. · 2023

Computer science Engineering Biology

Transformer architectures have facilitated the development of large-scale and general-purpose sequence models for prediction tasks in natural language processing and computer vision, e.g., GPT-3 and Swin Transformer. Although originally de…

Quiver: Supporting GPUs for Low-Latency, High-Throughput GNN Serving with Workload Awareness Open

Zeyuan Tan, Xiulong Yuan, Congjie He, Man-Kit Sit, Li Guo , et al. · 2023

Computer science Mathematics

Systems for serving inference requests on graph neural networks (GNN) must combine low latency with high throughout, but they face irregular computation due to skew in the number of sampled graph nodes and aggregated GNN features. This mak…

TorchOpt: An Efficient Library for Differentiable Optimization Open

Jie Ren, Xidong Feng, Bo Liu, Xuehai Pan, Yao Fu , et al. · 2022

Computer science Mathematics Philosophy

Recent years have witnessed the booming of various differentiable optimization algorithms. These algorithms exhibit different execution patterns, and their execution needs massive computational resources that go beyond a single CPU and GPU…

A Theoretical Understanding of Gradient Bias in Meta-Reinforcement Learning Open

Bo Liu, Xidong Feng, Jie Ren, Luo Mai, Rui Zhu , et al. · 2021

Mathematics Computer science Physics

Gradient-based Meta-RL (GMRL) refers to methods that maintain two-level optimisation procedures wherein the outer-loop meta-learner guides the inner-loop gradient-based reinforcement learner to achieve fast adaptations. In this paper, we d…

MegBA: A GPU-Based Distributed Library for Large-Scale Bundle Adjustment Open

Jie Ren, Wenteng Liang, Ran Yan, Luo Mai, Shiwen Liu , et al. · 2021

Computer science Engineering Physics

Large-scale Bundle Adjustment (BA) requires massive memory and computation resources which are difficult to be fulfilled by existing BA libraries. In this paper, we propose MegBA, a GPU-based distributed BA library. MegBA can provide massi…

Fast and Flexible Human Pose Estimation with HyperPose Open

Yixiao Guo, Jiawei Liu, Li Guo, Luo Mai, Hao Dong · 2021

Computer science Engineering

Estimating human pose is an important yet challenging task in multimedia\napplications. Existing pose estimation libraries target reproducing standard\npose estimation algorithms. When it comes to customising these algorithms for\nreal-wor…

Parallel Fully Convolutional Network for Semantic Segmentation Open

Jian Ji, Xiaocong Lu, Luo Mai, Minghui Yin, Qiguang Miao , et al. · 2020

Computer science

Fully convolutional networks (FCNs) have been widely applied for dense classification tasks such as semantic segmentation. As a large number of works based on FCNs are proposed, various semantic segmentation models have been improved signi…

Move Fast and Meet Deadlines: Fine-grained Real-time Stream Processing with Cameo Open

Le Xu, Shivaram Venkataraman, Indranil Gupta, Luo Mai, Rahul Potharaju · 2020

Computer science

Resource provisioning in multi-tenant stream processing systems faces the dual challenges of keeping resource utilization high (without over-provisioning), and ensuring performance isolation. In our common production use cases, where strea…

Move Fast and Meet Deadlines: Fine-grained Real-time Stream Processing\n with Cameo Open

Le Xu, Shivaram Venkataraman, Indranil Gupta, Luo Mai, Rahul Potharaju · 2020

Computer science

Resource provisioning in multi-tenant stream processing systems faces the\ndual challenges of keeping resource utilization high (without\nover-provisioning), and ensuring performance isolation. In our common\nproduction use cases, where st…

Efficient Reinforcement Learning Development with RLzoo Open

Zihan Ding, Tianyang Yu, Yanhua H. Huang, Hongming Zhang, Li Guo , et al. · 2020

Computer science Engineering

Many researchers and developers are exploring for adopting Deep Reinforcement Learning (DRL) techniques in their applications. They however often find such an adoption challenging. Existing DRL libraries provide poor support for prototypin…

RLzoo: A Comprehensive and Adaptive Reinforcement Learning Library. Open

Zihan Ding, Tianyang Yu, Yanhua H. Huang, Hongming Zhang, Luo Mai , et al. · 2020

Computer science Philosophy

Recently, we have seen a rapidly growing adoption of Deep Reinforcement Learning (DRL) technologies. Fully achieving the promise of these technologies in practice is, however, extremely difficult. Users have to invest tremendous efforts in…

KungFu: Making Training in Distributed Machine Learning Adaptive Open

Luo Mai, Li Guo, Marcel Wagenländer, Konstantinos Fertakis, Andrei-Octavian Brabete , et al. · 2020

Computer science Psychology Physics

When using distributed machine learning (ML) systems to train models on a cluster of worker machines, users must con-figure a large number of parameters: hyper-parameters (e.g. the batch size and the learning rate) affect model convergence…

CROSSBOW: Scaling Deep Learning with Small Batch Sizes on Multi-GPU Servers Open

Alexandros Koliousis, Pijika Watcharapichat, Matthias Weidlich, Luo Mai, Paolo Costa , et al. · 2019

Computer science Mathematics Physics

Deep learning models are trained on servers with many GPUs, and training must scale with the number of GPUs. Systems such as TensorFlow and Caffe2 train models with parallel synchronous stochastic gradient descent: they process a batch of …

Towards efficient big data processing in data centres Open

Luo Mai · 2017

Computer science

Large data processing systems require a high degree of coordination, and exhibit network bottlenecks due to massive communication data. This motivates my PhD study to propose system control mechanisms that improve monitoring and coordinati…

TensorLayer: A Versatile Library for Efficient Deep Learning Development Open

Hao Dong, Akara Supratak, Luo Mai, Fangde Liu, Axel Oehmichen , et al. · 2017

Computer science

Deep learning has enabled major advances in the fields of computer vision, natural language processing, and multimedia among many others. Developing a deep learning system is arduous and complex, as it involves constructing neural network …

Emu: Rapid Prototyping of Networking Services Open

Nik Sultana, Salvator Galea, David Greaves, Marcin Wójcik, Jonny Shipton , et al. · 2017

Computer science Mathematics

Due to their performance and flexibility, FPGAs are an attractive platform for the execution of network functions. It has been a challenge for a long time though to make FPGA programming accessible to a large audience of developers. An app…

Extending programs with debug-related features, with application to hardware development Open

Nik Sultana, Salvator Galea, David Greaves, Marcin Wójcik, Noa Zilberman , et al. · 2017

Computer science Mathematics

The capacity and programmability of reconfigurable hardware such as FPGAs has improved steadily over the years, but they do not readily provide any mechanisms for monitoring or debugging running programs. Such mechanisms need to be written…

Luo Mai YOU? Author Swipe