Exploring foci of:
arXiv (Cornell University)
FRED: Flexible REduction-Distribution Interconnect and Communication Implementation for Wafer-Scale Distributed Training of DNN Models
June 2024 • Saeed Rashidi, William Won, Sudarshan K. Srinivasan, Puneet Gupta, Tushar Krishna
Distributed Deep Neural Network (DNN) training is a technique to reduce the training overhead by distributing the training tasks into multiple accelerators, according to a parallelization strategy. However, high-performance compute and interconnects are needed for maximum speed-up and linear scaling of the system. Wafer-scale systems are a promising technology that allows for tightly integrating high-end accelerators with high-speed wafer-scale interconnects, making it an attractive platform for distributed traini…
Pjm Interconnection
Computer Science
Wafer (Electronics)
Embedded System
Engineering
Electrical Engineering
Mathematics
Geography
Cartography
Mathematical Analysis
Meteorology
Geometry