Holger Fröning
YOU?
Author Swipe
View article: Scalable and Efficient Intra- and Inter-node Interconnection Networks for Post-Exascale Supercomputers and Data centers
Scalable and Efficient Intra- and Inter-node Interconnection Networks for Post-Exascale Supercomputers and Data centers Open
The rapid growth of data-intensive applications such as generative AI, scientific simulations, and large-scale analytics is driving modern supercomputers and data centers toward increasingly heterogeneous and tightly integrated architectur…
View article: Variance-Aware Noisy Training: Hardening DNNs against Unstable Analog Computations
Variance-Aware Noisy Training: Hardening DNNs against Unstable Analog Computations Open
The disparity between the computational demands of deep learning and the capabilities of compute hardware is expanding drastically. Although deep learning achieves remarkable performance in countless tasks, its escalating requirements for …
View article: On Hardening DNNs against Noisy Computations
On Hardening DNNs against Noisy Computations Open
The success of deep learning has sparked significant interest in designing computer hardware optimized for the high computational demands of neural network inference. As further miniaturization of digital CMOS processors becomes increasing…
View article: Performance of the ATLAS GNN4ITk Particle Track Reconstruction GPU pipeline
Performance of the ATLAS GNN4ITk Particle Track Reconstruction GPU pipeline Open
With the upcoming upgrade of High Luminosity LHC, existing algorithms of the ATLAS Trigger System will demand increasing computational power by more than an order of magnitude. Therefore, alternative reconstruction techniques are explored …
View article: Function Space Diversity for Uncertainty Prediction via Repulsive Last-Layer Ensembles
Function Space Diversity for Uncertainty Prediction via Repulsive Last-Layer Ensembles Open
Bayesian inference in function space has gained attention due to its robustness against overparameterization in neural networks. However, approximating the infinite-dimensional function space introduces several challenges. In this work, we…
View article: Probabilistic photonic computing with chaotic light
Probabilistic photonic computing with chaotic light Open
View article: Less Memory Means smaller GPUs: Backpropagation with Compressed Activations
Less Memory Means smaller GPUs: Backpropagation with Compressed Activations Open
The ever-growing scale of deep neural networks (DNNs) has lead to an equally rapid growth in computational resource requirements. Many recent architectures, most prominently Large Language Models, have to be trained using supercomputers wi…
View article: Probabilistic Photonic Computing with Chaotic Light
Probabilistic Photonic Computing with Chaotic Light Open
Biological neural networks effortlessly tackle complex computational problems and excel at predicting outcomes from noisy, incomplete data, a task that poses significant challenges to traditional processors. Artificial neural networks (ANN…
View article: DeepHYDRA: A Hybrid Deep Learning and DBSCAN-Based Approach to Time-Series Anomaly Detection in Dynamically-Configured Systems
DeepHYDRA: A Hybrid Deep Learning and DBSCAN-Based Approach to Time-Series Anomaly Detection in Dynamically-Configured Systems Open
Anomaly detection in distributed systems such as High-Performance Computing (HPC) clusters is vital for early fault detection, performance optimisation, security monitoring, reliability in general but also operational insights. Deep Neural…
View article: GraphMatch: Subgraph Query Processing on FPGAs
GraphMatch: Subgraph Query Processing on FPGAs Open
Efficiently finding subgraph embeddings in large graphs is crucial for many application areas like biology and social network analysis. Set intersections are the predominant and most challenging aspect of current join-based subgraph query …
View article: Random telegraph noise characteristic of nonvolatile resistive random access memories based on optical interference principle
Random telegraph noise characteristic of nonvolatile resistive random access memories based on optical interference principle Open
The influence of random telegraph noise (RTN) could reduce the reading margin, which would cause computational errors in data recognition. This paper proposes a current sensor based on the principle of optical fiber interference, which can…
View article: Probabilistic Photonic Computing with Chaotic Light
Probabilistic Photonic Computing with Chaotic Light Open
Biological neural networks effortlessly tackle complex computational problems and excel at predicting outcomes from noisy, incomplete data, a task that poses significant challenges to traditional processors. Artificial neural networks (ANN…
View article: Implications of Noise in Resistive Memory on Deep Neural Networks for Image Classification
Implications of Noise in Resistive Memory on Deep Neural Networks for Image Classification Open
Resistive memory is a promising alternative to SRAM, but is also an inherently unstable device that requires substantial effort to ensure correct read and write operations. To avoid the associated costs in terms of area, time and energy, t…
View article: Compressing the Backward Pass of Large-Scale Neural Architectures by Structured Activation Pruning
Compressing the Backward Pass of Large-Scale Neural Architectures by Structured Activation Pruning Open
The rise of Deep Neural Networks (DNNs) has led to an increase in model size and complexity, straining the memory capacity of GPUs. Sparsity in DNNs, characterized as structural or ephemeral, has gained attention as a solution. This work f…
View article: CLAIRE-ROP: Rapid Partitioning-based Deformable Image Registration on Multi-GPU Accelerator
CLAIRE-ROP: Rapid Partitioning-based Deformable Image Registration on Multi-GPU Accelerator Open
Deformable image registration (DIR) is an important tool for clinical applications, especially in medical imaging and radiotherapy, as it ensures accurate image alignment and analysis. Given the high computational demands, acceleration of …
View article: CLAIRE-ROP: Rapid Partitioning-based Deformable Image Registration on Multi-GPU Accelerator
CLAIRE-ROP: Rapid Partitioning-based Deformable Image Registration on Multi-GPU Accelerator Open
Deformable image registration (DIR) is an important tool for clinical applications, especially in medical imaging and radiotherapy, as it ensures accurate image alignment and analysis. Given the high computational demands, acceleration of …
View article: On Performance Analysis of Graphcore IPUs: Analyzing Squared and Skewed Matrix Multiplication
On Performance Analysis of Graphcore IPUs: Analyzing Squared and Skewed Matrix Multiplication Open
In recent decades, High Performance Computing (HPC) has undergone significant enhancements, particularly in the realm of hardware platforms, aimed at delivering increased processing power while keeping power consumption within reasonable l…
View article: On the Non-Associativity of Analog Computations
On the Non-Associativity of Analog Computations Open
The energy efficiency of analog forms of computing makes it one of the most promising candidates to deploy resource-hungry machine learning tasks on resource-constrained system such as mobile or embedded devices. However, it is well known …
View article: Reducing Memory Requirements for the IPU using Butterfly Factorizations
Reducing Memory Requirements for the IPU using Butterfly Factorizations Open
High Performance Computing (HPC) benefits from different improvements during last decades, specially in terms of hardware platforms to provide more processing power while maintaining the power consumption at a reasonable level. The Intelli…
View article: GraphScale: Scalable Processing on FPGAs for HBM and Large Graphs
GraphScale: Scalable Processing on FPGAs for HBM and Large Graphs Open
Recent advances in graph processing on FPGAs promise to alleviate performance bottlenecks with irregular memory access patterns. Such bottlenecks challenge performance for a growing number of important application areas like machine learni…
View article: Implementation Techniques for SPMD Kernels on CPUs
Implementation Techniques for SPMD Kernels on CPUs Open
More and more frameworks and simulations are developed using heterogeneous programming models such as OpenCL, SYCL, CUDA, or HIP. A significant hurdle to mapping these models to CPUs in a performance-portable manner is that implementing wo…
View article: Walking Noise: On Layer-Specific Robustness of Neural Architectures against Noisy Computations and Associated Characteristic Learning Dynamics
Walking Noise: On Layer-Specific Robustness of Neural Architectures against Noisy Computations and Associated Characteristic Learning Dynamics Open
Deep neural networks are extremely successful in various applications, however they exhibit high computational demands and energy consumption. This is exacerbated by stuttering technology scaling, prompting the need for novel approaches to…
View article: Towards Hardware-Specific Automatic Compression of Neural Networks
Towards Hardware-Specific Automatic Compression of Neural Networks Open
Compressing neural network architectures is important to allow the deployment of models to embedded or mobile devices, and pruning and quantization are the major approaches to compress neural networks nowadays. Both methods benefit when co…
View article: GraphScale: Scalable Bandwidth-Efficient Graph Processing on FPGAs
GraphScale: Scalable Bandwidth-Efficient Graph Processing on FPGAs Open
Recent advances in graph processing on FPGAs promise to alleviate performance bottlenecks with irregular memory access patterns. Such bottlenecks challenge performance for a growing number of important application areas like machine learni…
View article: HW-Aware Initialization of DNN Auto-Tuning to Improve Exploration Time and Robustness
HW-Aware Initialization of DNN Auto-Tuning to Improve Exploration Time and Robustness Open
The process of optimizing the latency of DNN operators with ML models and hardware-in-the-loop, called auto-tuning, has established itself as a pervasive method for the deployment of neural networks. From a search space of loop-optimizatio…
View article: Joint Program and Layout Transformations to Enable Convolutional Operators on Specialized Hardware Based on Constraint Programming
Joint Program and Layout Transformations to Enable Convolutional Operators on Specialized Hardware Based on Constraint Programming Open
The success of Deep Artificial Neural Networks (DNNs) in many domains created a rich body of research concerned with hardware accelerators for compute-intensive DNN operators. However, implementing such operators efficiently with complex h…
View article: Scheduling of Graph Queries: Controlling Intra- and Inter-query\n Parallelism for a High System Throughput
Scheduling of Graph Queries: Controlling Intra- and Inter-query\n Parallelism for a High System Throughput Open
The vast amounts of data used in social, business or traffic networks,\nbiology and other natural sciences are often managed in graph-based data sets,\nconsisting of a few thousand up to billions and trillions of vertices and\nedges, respe…
View article: Scheduling of Graph Queries: Controlling Intra- and Inter-query Parallelism for a High System Throughput
Scheduling of Graph Queries: Controlling Intra- and Inter-query Parallelism for a High System Throughput Open
The vast amounts of data used in social, business or traffic networks, biology and other natural sciences are often managed in graph-based data sets, consisting of a few thousand up to billions and trillions of vertices and edges, respecti…
View article: Characterization of data compression across CPU platforms and accelerators
Characterization of data compression across CPU platforms and accelerators Open
The ever increasing amount of generated data makes it more and more beneficial to utilize compression to trade computations for data movement and reduced storage requirements. Lately, dedicated accelerators have been introduced to offload …
View article: Demystifying memory access patterns of FPGA-based graph processing accelerators
Demystifying memory access patterns of FPGA-based graph processing accelerators Open
Recent advances in reprogrammable hardware (e.g., FPGAs) and memory technology (e.g., DDR4, HBM) promise to solve performance problems inherent to graph processing like irregular memory access patterns on traditional hardware (e.g., CPU). …