Yanfang Le
YOU?
Author Swipe
View article: STrack: A Reliable Multipath Transport for AI/ML Clusters
STrack: A Reliable Multipath Transport for AI/ML Clusters Open
Emerging artificial intelligence (AI) and machine learning (ML) workloads present new challenges of managing the collective communication used in distributed training across hundreds or even thousands of GPUs. This paper presents STrack, a…
View article: FASTFLOW: Flexible Adaptive Congestion Control for High-Performance Datacenters
FASTFLOW: Flexible Adaptive Congestion Control for High-Performance Datacenters Open
The increasing demand of machine learning (ML) workloads in datacenters places significant stress on current congestion control (CC) algorithms, many of which struggle to maintain performance at scale. These workloads generate bursty, sync…
View article: Towards Accelerating Data Intensive Application's Shuffle Process Using SmartNICs
Towards Accelerating Data Intensive Application's Shuffle Process Using SmartNICs Open
The wide adoption of the emerging SmartNIC technology creates new opportunities to offload application-level computation into the networking layer, which frees the burden of host CPUs, leading to performance improvement. Shuffle, the all-t…
View article: SFC: Near-Source Congestion Signaling and Flow Control
SFC: Near-Source Congestion Signaling and Flow Control Open
State-of-the-art congestion control algorithms for data centers alone do not cope well with transient congestion and high traffic bursts. To help with these, we revisit the concept of direct \emph{backward} feedback from switches and propo…
View article: Efficient Data-Plane Memory Scheduling for In-Network Aggregation
Efficient Data-Plane Memory Scheduling for In-Network Aggregation Open
As the scale of distributed training grows, communication becomes a bottleneck. To accelerate the communication, recent works introduce In-Network Aggregation (INA), which moves the gradients summation into network middle-boxes, e.g., prog…
View article: PL2: Towards Predictable Low Latency in Rack-Scale Networks
PL2: Towards Predictable Low Latency in Rack-Scale Networks Open
High performance rack-scale offerings package disaggregated pools of compute, memory and storage hardware in a single rack to run diverse workloads with varying requirements, including applications that need low and predictable latency. Th…
View article: RoGUE
RoGUE Open
RDMA over Converged Ethernet (RoCE) promises low latency and low CPU utilization over commodity networks, and is attractive for cloud infrastructure services. Current implementations require Priority Flow Control (PFC) that uses backpressu…