Nathan Sheeley YOU? Author Swipe

Last 10y

Open Invitation to Help Curate This Field & Enhance Impact .ORG

Kernel Looping: Eliminating Synchronization Boundaries for Peak Inference Performance Open

David Koeplinger, Darshan Gandhi, Pushkar Nandkar, Nathan Sheeley, Matheen Musaddiq , et al. · 2024

Computer science Mathematics

Token generation speed is critical to power the next wave of AI inference applications. GPUs significantly underperform during token generation due to synchronization overheads at kernel boundaries, utilizing only 21% of their peak memory …

Creating related items for first view…