Kevin Barker
YOU?
Author Swipe
View article: 5G Energy FRAME Report on 5G for Grid Use Case (Year 3 Final Report)
5G Energy FRAME Report on 5G for Grid Use Case (Year 3 Final Report) Open
This report provides an extensive overview of the interrelationships among energy, communication, and computing—especially in the context of decarbonization goals, challenges, and opportunities. Technical examples enabled by 5G technologie…
View article: Evaluating Emerging AI/ML Accelerators: IPU, RDU, and NVIDIA/AMD GPUs
Evaluating Emerging AI/ML Accelerators: IPU, RDU, and NVIDIA/AMD GPUs Open
The relentless advancement of artificial intelligence (AI) and machine learning (ML) applications necessitates the development of specialized hardware accelerators capable of handling the increasing complexity and computational demands. Tr…
View article: Beyond the Bridge: Contention-Based Covert and Side Channel Attacks on Multi-GPU Interconnect
Beyond the Bridge: Contention-Based Covert and Side Channel Attacks on Multi-GPU Interconnect Open
High-speed interconnects, such as NVLink, are integral to modern multi-GPU systems, acting as a vital link between CPUs and GPUs. This study highlights the vulnerability of multi-GPU systems to covert and side channel attacks due to conges…
View article: Experiences from the Roadrunner petascale hybrid systems
Experiences from the Roadrunner petascale hybrid systems Open
The combination of flexible microprocessors (AMD Opterons) with high-performing accelerators (IBM PowerXCell 8i) resulted in the extremely powerful Roadrunner system. Many challenges in both hardware and software were overcome to achieve i…
View article: Comparing current cluster, massively parallel, and accelerated systems
Comparing current cluster, massively parallel, and accelerated systems Open
Currently there is large architectural diversity in high perfonnance computing systems. They include 'commodity' cluster systems that optimize per-node performance for small jobs, massively parallel processors (MPPs) that optimize aggregat…
View article: The Landscape of Modern Machine Learning: A Review of Machine, Distributed and Federated Learning
The Landscape of Modern Machine Learning: A Review of Machine, Distributed and Federated Learning Open
With the advance of the powerful heterogeneous, parallel and distributed computing systems and ever increasing immense amount of data, machine learning has become an indispensable part of cutting-edge technology, scientific research and co…
View article: MPGemmFI: A Fault Injection Technique for Mixed Precision GEMM in ML Applications
MPGemmFI: A Fault Injection Technique for Mixed Precision GEMM in ML Applications Open
Emerging deep learning workloads urgently need fast general matrix multiplication (GEMM). To meet such demand, one of the critical features of machine-learning-specific accelerators such as NVIDIA Tensor Cores, AMD Matrix Cores, and Google…
View article: Evaluating Emerging AI/ML Accelerators: IPU, RDU, and NVIDIA/AMD GPUs
Evaluating Emerging AI/ML Accelerators: IPU, RDU, and NVIDIA/AMD GPUs Open
The relentless advancement of artificial intelligence (AI) and machine learning (ML) applications necessitates the development of specialized hardware accelerators capable of handling the increasing complexity and computational demands. Tr…
View article: 5G Energy FRAME: The Design and Implementation of Data, Model, and Use Case (Year 2 Report)
5G Energy FRAME: The Design and Implementation of Data, Model, and Use Case (Year 2 Report) Open
This report summarizes the Year 2 work of Pacific Northwest National Laboratory’s (PNNL’s) 5G Fabricated Resource and Asset Management Encompassment for energy infrastructure (Energy FRAME) project funded by the Department of Energy Office…
View article: Denial of Service Attack Detection via Differential Analysis of Generalized Entropy Progressions
Denial of Service Attack Detection via Differential Analysis of Generalized Entropy Progressions Open
Denial-of-Service (DoS) attacks are one of the most common and consequential\ncyber attacks in computer networks. While existing research offers a plethora\nof detection methods, the issue of achieving both scalability and high\ndetection …
View article: Codesign for Extreme Heterogeneity: Integrating Custom Hardware With Commodity Computing Technology to Support Next-Generation HPC Converged Workloads
Codesign for Extreme Heterogeneity: Integrating Custom Hardware With Commodity Computing Technology to Support Next-Generation HPC Converged Workloads Open
The future of high-performance technical computing will be driven by the convergence of physical simulation, Artificial Intelligence (AI), Machine Learning (ML), and data science computing capabilities. While computational performance gain…
View article: Direction-optimizing Label Propagation Framework for Structure Detection in Graphs: Design, Implementation, and Experimental Analysis
Direction-optimizing Label Propagation Framework for Structure Detection in Graphs: Design, Implementation, and Experimental Analysis Open
Label Propagation is not only a well-known machine learning algorithm for classification but also an effective method for discovering communities and connected components in networks. We propose a new Direction-optimizing Label Propagation…
View article: MSREP: A Fast yet Light Sparse Matrix Framework for Multi-GPU Systems
MSREP: A Fast yet Light Sparse Matrix Framework for Multi-GPU Systems Open
Sparse linear algebra kernels play a critical role in numerous applications, covering from exascale scientific simulation to large-scale data analytics. Offloading linear algebra kernels on one GPU will no longer be viable in these applica…
View article: MGG: Accelerating Graph Neural Networks with Fine-grained intra-kernel Communication-Computation Pipelining on Multi-GPU Platforms
MGG: Accelerating Graph Neural Networks with Fine-grained intra-kernel Communication-Computation Pipelining on Multi-GPU Platforms Open
The increasing size of input graphs for graph neural networks (GNNs) highlights the demand for using multi-GPU platforms. However, existing multi-GPU GNN systems optimize the computation and communication individually based on the conventi…
View article: Technical Characterization and Benefit Evaluation of 5G-Enabled Grid Data Transport and Applications
Technical Characterization and Benefit Evaluation of 5G-Enabled Grid Data Transport and Applications Open
This report summarizes the Year 1 work of Pacific Northwest National Laboratory’s (PNNL’s) 5G Fabricated Resource and Asset Management Encompassment for energy infrastructure (Energy FRAME) project funded by the Department of Energy Office…
View article: Spy in the GPU-box: Covert and Side Channel Attacks on Multi-GPU Systems
Spy in the GPU-box: Covert and Side Channel Attacks on Multi-GPU Systems Open
The deep learning revolution has been enabled in large part by GPUs, and more recently accelerators, which make it possible to carry out computationally demanding training and inference in acceptable times. As the size of machine learning …
View article: Bit-GraphBLAS: Bit-Level Optimizations of Matrix-Centric Graph Processing on GPU
Bit-GraphBLAS: Bit-Level Optimizations of Matrix-Centric Graph Processing on GPU Open
In a general graph data structure like an adjacency matrix, when edges are homogeneous, the connectivity of two nodes can be sufficiently represented using a single bit. This insight has, however, not yet been adequately exploited by the e…
View article: Hardware Evaluation Analytical Modeling and Node Simulation: Benefits of Tighter GPU Integration
Hardware Evaluation Analytical Modeling and Node Simulation: Benefits of Tighter GPU Integration Open
In this report, we examine several emerging technologies of interest to the Department of Energy and its computational centers. These include: 1) quantifying the benefit of tighter CPU-GPU integration, 2) quantifying the appropriate CPU co…
View article: Denial-of-Service Attack Detection via Differential Analysis of Generalized Entropy Progressions
Denial-of-Service Attack Detection via Differential Analysis of Generalized Entropy Progressions Open
Denial-of-Service (DoS) attacks are one of the most common and consequential cyber attacks in computer networks. While existing research offers a plethora of detection methods, the issue of achieving both scalability and high detection acc…
View article: Leaky Buddies: Cross-Component Covert Channels on Integrated CPU-GPU Systems
Leaky Buddies: Cross-Component Covert Channels on Integrated CPU-GPU Systems Open
Graphics Processing Units (GPUs) are a ubiquitous component across the range of today's computing platforms, from phones and tablets, through personal computers, to high-end server class platforms. With the increasing importance of graphic…
View article: ARENA: Asynchronous Reconfigurable Accelerator Ring to Enable Data-Centric Parallel Computing
ARENA: Asynchronous Reconfigurable Accelerator Ring to Enable Data-Centric Parallel Computing Open
The next generation HPC and data centers are likely to be reconfigurable and data-centric due to the trend of hardware specialization and the emergence of data-driven applications. In this work, we propose ARENA – an asynchronous reconfigu…
View article: pnnl/arena
pnnl/arena Open
CFA ARENA is a novel programming model with the support of a runtime targeting asynchronous data-centric execution paradigm in a distributed system. All the machine nodes in ARENA are connected by a ring network to bring the specialized co…
View article: Fast and Scalable Sparse Triangular Solver for Multi-GPU Based HPC Architectures
Fast and Scalable Sparse Triangular Solver for Multi-GPU Based HPC Architectures Open
Designing efficient and scalable sparse linear algebra kernels on modern multi-GPU based HPC systems is a daunting task due to significant irregular memory references and workload imbalance across the GPUs. This is particularly the case fo…
View article: Leaky Buddies: Cross-Component Covert Channels on Integrated CPU-GPU\n Systems
Leaky Buddies: Cross-Component Covert Channels on Integrated CPU-GPU\n Systems Open
Graphics Processing Units (GPUs) are a ubiquitous component across the range\nof today's computing platforms, from phones and tablets, through personal\ncomputers, to high-end server class platforms. With the increasing importance\nof grap…
View article: ARENA: Asynchronous Reconfigurable Accelerator Ring to Enable Data-Centric Parallel Computing
ARENA: Asynchronous Reconfigurable Accelerator Ring to Enable Data-Centric Parallel Computing Open
The next generation HPC and data centers are likely to be reconfigurable and data-centric due to the trend of hardware specialization and the emergence of data-driven applications. In this paper, we propose ARENA -- an asynchronous reconfi…