Chopper: A Multi-Level GPU Characterization Tool &amp; Derived Insights Into LLM Training Inefficiency

Exploring foci of: arXiv (Cornell University) Chopper: A Multi-Level GPU Characterization Tool &amp; Derived Insights Into LLM Training Inefficiency December 2025 • Kurzynski, Marco, Aga, Shaizeen, Wu, Di Training large language models (LLMs) efficiently requires a deep understanding of how modern GPU systems behave under real-world distributed training workloads. While prior work has focused primarily on kernel-level performance or single-GPU microbenchmarks, the complex interaction between communication, computation, memory behavior, and power management in multi-GPU LLM training remains poorly characterized. In this work, we introduce Chopper, a profiling and analysis framework that collects, aligns, and visuali… Open Article Page

Computer Science X-Inefficiency Architecture Artificial Intelligence Parallel Computing Computer Engineering Training, Validation, And Test Data Sets Embedded System Machine Learning Open Article

Data Structure Visualization (Graphics) Open Article