Explanipedia

An Asynchronous Many-Task Algorithm for Unstructured $S_{N}$ Transport on Shared Memory Systems Open

Adam Elwood, Tom Deakin, Justin Lovegrove, Chris Nelson · 2025

Discrete ordinates $S_N$ transport solvers on unstructured meshes pose a challenge to scale due to complex data dependencies, memory access patterns and a high-dimensional domain. In this paper, we review the performance bottlenecks within…

Weight-Space Linear Recurrent Neural Networks Open

Roussel Desmond Nzoyem, Nawid Keshtmand, Eduardo Fernández, Idriss Tsayem, Raúl Santos‐Rodríguez , et al. · 2025

We introduce WARP (Weight-space Adaptive Recurrent Prediction), a simple yet powerful model that unifies weight-space learning with linear recurrence to redefine sequence modeling. Unlike conventional recurrent neural networks (RNNs) which…

Towards Foundational Models for Dynamical System Reconstruction: Hierarchical Meta-Learning via Mixture of Experts Open

Roussel Desmond Nzoyem, David A. W. Barton, Tom Deakin, Barton, David A. W., Deakin, Tom · 2025

As foundational models reshape scientific discovery, a bottleneck persists in dynamical system reconstruction (DSR): the ability to learn across system hierarchies. Many meta-learning approaches have been applied successfully to single sys…

Reevaluating Meta-Learning Optimization Algorithms Through Contextual Self-Modulation Open

Roussel Desmond Nzoyem, David A. W. Barton, Tom Deakin · 2024

Psychology Computer science Art

Contextual Self-Modulation (CSM) (Nzoyem et al., 2025) is a potent regularization mechanism for Neural Context Flows (NCFs) which demonstrates powerful meta-learning on physical systems. However, CSM has limitations in its applicability ac…

Neural Context Flows for Meta-Learning of Dynamical Systems Open

Roussel Desmond Nzoyem, David A. W. Barton, Tom Deakin · 2024

Computer science Psychology Physics

Neural Ordinary Differential Equations (NODEs) often struggle to adapt to new dynamic behaviors caused by parameter changes in the underlying physical system, even when these dynamics are similar to previously observed behaviors. This prob…

Preliminary report: Initial evaluation of StdPar implementations on AMD GPUs for HPC Open

Wei-Chen Lin, Simon McIntosh‐Smith, Tom Deakin · 2024

Computer science History

Recently, AMD platforms have not supported offloading C++17 PSTL (StdPar) programs to the GPU. Our previous work highlights how StdPar is able to achieve good performance across NVIDIA and Intel GPU platforms. In that work, we acknowledged…

A Comparison of Mesh-Free Differentiable Programming and Data-Driven Strategies for Optimal Control under PDE Constraints Open

Roussel Desmond Nzoyem Ngueguin, David A. W. Barton, Tom Deakin · 2023

Computer science Mathematics Geography

The field of Optimal Control under Partial Differential Equations (PDE)\nconstraints is rapidly changing under the influence of Deep Learning and the\naccompanying automatic differentiation libraries. Novel techniques like\nPhysics-Informe…

Principles for Automated and Reproducible Benchmarking Open

T. Koskela, Ilektra A. Christidi, Mosé Giordano, Emily Dubrovska, James Quinn , et al. · 2023

Computer science Engineering Physics

The diversity in processor technology used by High Performance Computing (HPC) facilities is growing, and so applications must be written in such a way that they can attain high levels of performance across a range of different CPUs, GPUs,…

Programming Your GPU with OpenMP Open

Tom Deakin, Timothy G. Mattson · 2023

Computer science

The essential guide for writing portable, parallel programs for GPUs using the OpenMP programming model. Today's computers are complex, multi-architecture systems: multiple cores in a shared address space, graphics processing units (GPUs),…

Heterogeneous Programming for the Homogeneous Majority Open

Tom Deakin, James H. Cownie, Wei-Chen Lin, Simon McIntosh‐Smith · 2022

Computer science Physics Geography

In order to take advantage of the burgeoning diversity in processors at the frontier of supercomputing, the HPC community is migrating and improving codes to utilise heterogeneous nodes, where accelerators, principally GPUs, are highly pre…

Pulse shape simulations for organic scintillation detectors using Geant4 Open

Caroline Holroyd, Michael D. Aspinall, Tom Deakin · 2021

Physics Materials science Chemistry

The accurate simulation of the temporal pulse shapes from organic scintillation detectors capable of pulse shape discrimination (PSD) presents the opportunity to assess the pulse shape discrimination of these detectors prior to fabrication…

Hostile Cache Implications for Small, Dense Linear Solves Open

Tom Deakin, James H. Cownie, Simon McIntosh‐Smith, Justin Lovegrove, R.P. Smedley‐Stevenson · 2020

Computer science Mathematics Engineering

The full assembly of the stiffness matrix in finite element codes can be prohibitive in terms of memory footprint resulting from storing that enormous matrix. An optimisation and work around, particularly effective for discontinuous Galerk…

Interpreting and Visualizing Performance Portability Metrics Open

Jason Sewall, S. J. Pennycook, Douglas W. Jacobsen, Tom Deakin, Simon McIntosh‐Smith · 2020

Computer science

Recent work has introduced a number of tools and techniques for reasoning about the interplay between application performance and portability, or "performance portability". These tools have proven useful for setting goals and guiding high-…

Tracking Performance Portability on the Yellow Brick Road to Exascale Open

Tom Deakin, Andrei Poenaru, Tom Lin, Simon McIntosh‐Smith · 2020

Computer science

With Exascale machines on our immediate horizon, there is a pressing need for applications to be made ready to best exploit these systems. However, there will be multiple paths to Exascale, with each system relying on processor and acceler…

Reviewing the Computational Performance of Structured and Unstructured Grid Deterministic <i>S<sub>N</sub></i> Transport Sweeps on Many-Core Architectures Open

Tom Deakin, Simon McIntosh‐Smith, Justin Lovegrove, R.P. Smedley‐Stevenson, Andrew Hagues · 2020

Computer science Mathematics Materials science

In recent years the computer processors underpinning the large, distributed, workhorse computers used to solve the Boltzmann transport equation have become ever more parallel and diverse. Traditional CPU architectures have increased in cor…

Benchmarking the first generation of production quality Arm‐based supercomputers Open

Simon McIntosh‐Smith, James Price, Andrei Poenaru, Tom Deakin · 2019

Computer science Business

In this paper, we present scaling results from two production quality supercomputers that use the first generation of Arm‐based CPUs that have been optimized for scientific workloads. Both systems use Marvell ThunderX2 CPUs, which deliver …

Performance Portability across Diverse Computer Architectures Open

Tom Deakin, Simon McIntosh‐Smith, James Price, Andrei Poenaru, Patrick Atkinson , et al. · 2019

Computer science Geography Art

Previous studies into performance portability have typically analysed a single application (and its various imple- mentations) in isolation. In this study we explore the wider landscape of performance portability by considering a number of…

Reviewing the Computational Performance of Deterministic SN Transport Sweeps on Many-Core Architectures Open

Tom Deakin, Simon McIntosh‐Smith, Justin Lovegrove, R.P. Smedley‐Stevenson, Andrew Hagues · 2019

Computer science

[no abstract]

Developing a mini-app for exploring algorithms for unstructured mesh deterministic discrete ordinates transport on many-core architectures Open

Tom Deakin, Simon McIntosh‐Smith, Justin Lovegrove, R.P. Smedley‐Stevenson, Andrew Hagues · 2019

Computer science

Recent trends in computational architecture design are yielding processors with deep and complex memory hierarchies consisting of small capacity caches and large capacity main memory. CPU parallelism is also hierarchical, consisting of SIM…

Scaling Results From the First Generation of Arm-based Supercomputers Open

Simon McIntosh‐Smith, James Price, Andrei Poenaru, Tom Deakin · 2019

Computer science Mathematics

In this paper we present the first scaling results from Isambard, the first production supercomputer to be based on Arm CPUs that have been optimised specifically for HPC. Isambard is a Cray XC50 ‘Scout’ system, combining Marvell ThunderX2…

A performance analysis of the first generation of HPC‐optimized Arm processors Open

Simon McIntosh‐Smith, James Price, Tom Deakin, Andrei Poenaru · 2019

Computer science Engineering Geography

Summary In this paper, we present performance results from Isambard, the first production supercomputer to be based on Arm CPUs that have been optimized specifically for HPC. Isambard is the first Cray XC50 “Scout” system, combining Cavium…

Evaluating attainable memory bandwidth of parallel programming models via BabelStream Open

Matt Martineau, Simon McIntosh‐Smith, James Price, Tom Deakin · 2017

Computer science

Many scientific codes consist of memory bandwidth bound kernels — thedominating factor of the runtime is the speed at which data can be loaded frommemory into the Arithmetic Logic Units, before results are written back to memory. One major…

GPU-STREAM: now in 2D! Open

Tom Deakin, James Price, Matt Martineau, Simon McIntosh‐Smith · 2016

Computer science

We present a major update to the GPU-STREAM benchmark, first shown at SC’15. The original benchmark allowed comparison of achievable memory bandwidth performance through the STREAM kernels on OpenCL devices. GPU-STREAM v2.0 extends the ben…

Tom Deakin YOU? Author Swipe