Efficient warp execution in presence of divergence with collaborative context collection

Farzad Khorasani , Rajiv Gupta , Laxmi N. Bhuyan ·

YOU? · · 2015 · Open Access · · DOI: https://doi.org/10.1145/2830772.2830796

GPU's SIMD architecture is a double-edged sword confronting parallel tasks with control flow divergence. On the one hand, it provides a high performance yet power-efficient platform to accelerate applications via massive parallelism; however, on the other hand, irregularities induce inefficiencies due to the warp's lockstep traversal of all diverging execution paths. In this work, we present a software (compiler) technique named Collaborative Context Collection (CCC) that increases the warp execution efficiency when faced with thread divergence incurred either by different intra-warp task assignment or by intra-warp load imbalance. CCC collects the relevant registers of divergent threads in a warp-specific stack allocated in the fast shared memory, and restores them only when the perfect utilization of warp lanes becomes feasible. We propose code transformations to enable applicability of CCC to variety of program segments with thread divergence. We also introduce optimizations to reduce the cost of CCC and to avoid device occupancy limitation or memory divergence. We have developed a framework that automates application of CCC to CUDA generated intermediate PTX code. We evaluated CCC on real-world applications and multiple scenarios using synthetic programs. CCC improves the warp execution efficiency of real-world benchmarks by up to 56% and achieves an average speedup of 1.69x (maximum 3.08x).

Concepts

Computer science Parallel computing Thread (computing) Speedup Compiler Control flow SIMD Context switch Instruction set Multithreading Execution model CUDA Operating system Programming language

Metadata

Type: article
Language: en
Landing Page: https://doi.org/10.1145/2830772.2830796
PDF: https://dl.acm.org/doi/pdf/10.1145/2830772.2830796
OA Status: gold
Cited By: 34
References: 51
Related Works: 10
OpenAlex ID: https://openalex.org/W2236252626

All OpenAlex metadata

Raw OpenAlex JSON

No additional metadata available.

Efficient warp execution in presence of divergence with collaborative context collection Article Swipe

Related Topics To Compare & Contrast

Raw OpenAlex JSON