arXiv (Cornell University)
Scalable GPU Performance Variability Analysis framework
June 2025 • Ankur Lahiry, Ayush Pokharel, Seth Ockerman, Amal Gueroudji, Line Pouchard, Tanzima Islam
Analyzing large-scale performance logs from GPU profilers often requires terabytes of memory and hours of runtime, even for basic summaries. These constraints prevent timely insight and hinder the integration of performance analytics into automated workflows. Existing analysis tools typically process data sequentially, making them ill-suited for HPC workflows with growing trace complexity and volume. We introduce a distributed data analysis framework that scales with dataset size and compute availability. Rather t…