David Koeplinger
YOU?
Author Swipe
View article: Kernel Looping: Eliminating Synchronization Boundaries for Peak Inference Performance
Kernel Looping: Eliminating Synchronization Boundaries for Peak Inference Performance Open
Token generation speed is critical to power the next wave of AI inference applications. GPUs significantly underperform during token generation due to synchronization overheads at kernel boundaries, utilizing only 21% of their peak memory …
Practical Design Space Exploration Open
Multi-objective optimization is a crucial matter in computer systems design space exploration because real-world applications often rely on a trade-off between several objectives. Derivatives are usually not available or impractical to com…
Spatial: a language and compiler for application accelerators Open
Industry is increasingly turning to reconfigurable architectures like FPGAs and CGRAs for improved performance and energy efficiency. Unfortunately, adoption of these architectures has been limited by their programming models. HDLs lack ab…
Spatial: a language and compiler for application accelerators Open
Industry is increasingly turning to reconfigurable architectures like FPGAs and CGRAs for improved performance and energy efficiency. Unfortunately, adoption of these architectures has been limited by their programming models. HDLs lack ab…
Plasticine Open
Reconfigurable architectures have gained popularity in recent years as they allow the design of energy-efficient accelerators. Fine-grain fabrics (e.g. FPGAs) have traditionally suffered from performance and power inefficiencies due to bit…
Automatic Generation of Efficient Accelerators for Reconfigurable Hardware Open
Acceleration in the form of customized datapaths offer large performance and energy improvements over general purpose processors. Reconfigurable fabrics such as FPGAs are gaining popularity for use in implementing application-specific acce…
View article: Generating Configurable Hardware from Parallel Patterns
Generating Configurable Hardware from Parallel Patterns Open
In recent years the computing landscape has seen an increasing shift towards specialized accelerators. Field programmable gate arrays (FPGAs) are particularly promising for the implementation of these accelerators, as they offer significan…
Generating Configurable Hardware from Parallel Patterns Open
In recent years the computing landscape has seen an increasing shift towards specialized accelerators. Field programmable gate arrays (FPGAs) are particularly promising for the implementation of these accelerators, as they offer significan…