Explanipedia

TensorRight: Automated Verification of Tensor Graph Rewrites Open

Jai Arora, Sirui Lu, Devansh Jain, Tianfan Xu, Farzin Houshmand , et al. · 2025

Computer science Biology

Tensor compilers, essential for generating efficient code for deep learning models across various applications, employ tensor graph rewrites as one of the key optimizations. These rewrites optimize tensor computational graphs with the expe…

Overlap Communication with Dependent Computation via Decomposition in Large Deep Learning Models Open

Shibo Wang, Jinliang Wei, Amit Sabne, Andy Davis, Berkin Ilbeyi , et al. · 2022

Computer science Biology Chemistry

Large deep learning models have shown great potential with state-of-the-art results in many tasks. However, running these large models is quite challenging on an accelerator (GPU or TPU) because the on-device memory is too limited for the …

A Learned Performance Model for Tensor Processing Units Open

Sam Kaufman, Phitchaya Mangpo Phothilimthana, Yanqi Zhou, Charith Mendis, Sudip Roy , et al. · 2021

Computer science Mathematics

Accurate hardware performance models are critical to efficient code generation. They can be used by compilers to make heuristic decisions, by superoptimizers as a minimization objective, or by autotuners to find an optimal configuration fo…

A Learned Performance Model for Tensor Processing Units Open

Samuel J. Kaufman, Phitchaya Mangpo Phothilimthana, Yanqi Zhou, Charith Mendis, Sudip Roy , et al. · 2020

Computer science Mathematics

Accurate hardware performance models are critical to efficient code generation. They can be used by compilers to make heuristic decisions, by superoptimizers as a minimization objective, or by autotuners to find an optimal configuration fo…

Fast Distributed Bandits for Online Recommendation Systems Open

Kanak Mahadik, Qingyun Wu, Shuai Li, Amit Sabne · 2020

Computer science Psychology

Contextual bandit algorithms are commonly used in recommender systems, where content popularity can change rapidly. These algorithms continuously learn latent mappings between users and items, based on contexts associated with them both. R…

Fast distributed bandits for online recommendation systems Open

Kanak Mahadik, Qingyun Wu, Shuai Li, Amit Sabne · 2020

Computer science Physics Psychology

Contextual bandit algorithms are commonly used in recommender systems, where content popularity can change rapidly. These algorithms continuously learn latent mappings between users and items, based on contexts associated with them both. R…

RegDem: Increasing GPU Performance via Shared Memory Register Spilling Open

Putt Sakdhnagool, Amit Sabne, Rudolf Eigenmann · 2019

Computer science Mathematics Philosophy

GPU utilization, measured as occupancy, is limited by the parallel threads' combined usage of on-chip resources, such as registers and the programmer-managed shared memory. Higher resource demand means lower effective parallel thread count…

Massively parallel 3D image reconstruction Open

Xiao Wang, Amit Sabne, Putt Sakdhnagool, Sherman J. Kisner, Charles A. Bouman , et al. · 2017

Computer science Materials science Economics

Computed Tomographic (CT) image reconstruction is an important technique used in a wide range of applications. Among reconstruction methods, Model-Based Iterative Reconstruction (MBIR) is known to produce much higher quality CT images; how…

Evaluating Performance Portability of OpenACC Open

Amit Sabne, Putt Sakdhnagool, Seyong Lee, Jeffrey S. Vetter · 2016

Computer science

Accelerator-based heterogeneous computing is gaining momentum in High Performance Computing arena. However, the increased complexity of the accelerator architectures demands more generic, high-level programming models. OpenACC is one such …

Programming models, compilers, and runtime systems for accelerator computing Open

Amit Sabne · 2016

Computer science

Accelerators, such as GPUs and Intel Xeon Phis, have become the workhorses of high-performance computing. Typically, the accelerators act as co-processors, with discrete memory spaces. They possess massive parallelism, along with many othe…

Amit Sabne YOU? Author Swipe