Tom Deakin
YOU?
Author Swipe
An Asynchronous Many-Task Algorithm for Unstructured $S_{N}$ Transport on Shared Memory Systems Open
Discrete ordinates $S_N$ transport solvers on unstructured meshes pose a challenge to scale due to complex data dependencies, memory access patterns and a high-dimensional domain. In this paper, we review the performance bottlenecks within…
Weight-Space Linear Recurrent Neural Networks Open
We introduce WARP (Weight-space Adaptive Recurrent Prediction), a simple yet powerful model that unifies weight-space learning with linear recurrence to redefine sequence modeling. Unlike conventional recurrent neural networks (RNNs) which…
Towards Foundational Models for Dynamical System Reconstruction: Hierarchical Meta-Learning via Mixture of Experts Open
As foundational models reshape scientific discovery, a bottleneck persists in dynamical system reconstruction (DSR): the ability to learn across system hierarchies. Many meta-learning approaches have been applied successfully to single sys…
Reevaluating Meta-Learning Optimization Algorithms Through Contextual Self-Modulation Open
Contextual Self-Modulation (CSM) (Nzoyem et al., 2025) is a potent regularization mechanism for Neural Context Flows (NCFs) which demonstrates powerful meta-learning on physical systems. However, CSM has limitations in its applicability ac…
Neural Context Flows for Meta-Learning of Dynamical Systems Open
Neural Ordinary Differential Equations (NODEs) often struggle to adapt to new dynamic behaviors caused by parameter changes in the underlying physical system, even when these dynamics are similar to previously observed behaviors. This prob…
Preliminary report: Initial evaluation of StdPar implementations on AMD GPUs for HPC Open
Recently, AMD platforms have not supported offloading C++17 PSTL (StdPar) programs to the GPU. Our previous work highlights how StdPar is able to achieve good performance across NVIDIA and Intel GPU platforms. In that work, we acknowledged…
A Comparison of Mesh-Free Differentiable Programming and Data-Driven Strategies for Optimal Control under PDE Constraints Open
The field of Optimal Control under Partial Differential Equations (PDE)\nconstraints is rapidly changing under the influence of Deep Learning and the\naccompanying automatic differentiation libraries. Novel techniques like\nPhysics-Informe…
Principles for Automated and Reproducible Benchmarking Open
The diversity in processor technology used by High Performance Computing (HPC) facilities is growing, and so applications must be written in such a way that they can attain high levels of performance across a range of different CPUs, GPUs,…
View article: Programming Your GPU with OpenMP
Programming Your GPU with OpenMP Open
The essential guide for writing portable, parallel programs for GPUs using the OpenMP programming model. Today's computers are complex, multi-architecture systems: multiple cores in a shared address space, graphics processing units (GPUs),…
Heterogeneous Programming for the Homogeneous Majority Open
In order to take advantage of the burgeoning diversity in processors at the frontier of supercomputing, the HPC community is migrating and improving codes to utilise heterogeneous nodes, where accelerators, principally GPUs, are highly pre…
Pulse shape simulations for organic scintillation detectors using Geant4 Open
The accurate simulation of the temporal pulse shapes from organic scintillation detectors capable of pulse shape discrimination (PSD) presents the opportunity to assess the pulse shape discrimination of these detectors prior to fabrication…
Hostile Cache Implications for Small, Dense Linear Solves Open
The full assembly of the stiffness matrix in finite element codes can be prohibitive in terms of memory footprint resulting from storing that enormous matrix. An optimisation and work around, particularly effective for discontinuous Galerk…
Interpreting and Visualizing Performance Portability Metrics Open
Recent work has introduced a number of tools and techniques for reasoning about the interplay between application performance and portability, or "performance portability". These tools have proven useful for setting goals and guiding high-…
Tracking Performance Portability on the Yellow Brick Road to Exascale Open
With Exascale machines on our immediate horizon, there is a pressing need for applications to be made ready to best exploit these systems. However, there will be multiple paths to Exascale, with each system relying on processor and acceler…
Reviewing the Computational Performance of Structured and Unstructured Grid Deterministic <i>S<sub>N</sub></i> Transport Sweeps on Many-Core Architectures Open
In recent years the computer processors underpinning the large, distributed, workhorse computers used to solve the Boltzmann transport equation have become ever more parallel and diverse. Traditional CPU architectures have increased in cor…
Benchmarking the first generation of production quality Arm‐based supercomputers Open
In this paper, we present scaling results from two production quality supercomputers that use the first generation of Arm‐based CPUs that have been optimized for scientific workloads. Both systems use Marvell ThunderX2 CPUs, which deliver …
Performance Portability across Diverse Computer Architectures Open
Previous studies into performance portability have typically analysed a single application (and its various imple- mentations) in isolation. In this study we explore the wider landscape of performance portability by considering a number of…
Reviewing the Computational Performance of Deterministic SN Transport Sweeps on Many-Core Architectures Open
[no abstract]
Developing a mini-app for exploring algorithms for unstructured mesh deterministic discrete ordinates transport on many-core architectures Open
Recent trends in computational architecture design are yielding processors with deep and complex memory hierarchies consisting of small capacity caches and large capacity main memory. CPU parallelism is also hierarchical, consisting of SIM…
Scaling Results From the First Generation of Arm-based Supercomputers Open
In this paper we present the first scaling results from Isambard, the first production supercomputer to be based on Arm CPUs that have been optimised specifically for HPC. Isambard is a Cray XC50 ‘Scout’ system, combining Marvell ThunderX2…
A performance analysis of the first generation of HPC‐optimized Arm processors Open
Summary In this paper, we present performance results from Isambard, the first production supercomputer to be based on Arm CPUs that have been optimized specifically for HPC. Isambard is the first Cray XC50 “Scout” system, combining Cavium…
Evaluating attainable memory bandwidth of parallel programming models via BabelStream Open
Many scientific codes consist of memory bandwidth bound kernels — thedominating factor of the runtime is the speed at which data can be loaded frommemory into the Arithmetic Logic Units, before results are written back to memory. One major…
GPU-STREAM: now in 2D! Open
We present a major update to the GPU-STREAM benchmark, first shown at SC’15. The original benchmark allowed comparison of achievable memory bandwidth performance through the STREAM kernels on OpenCL devices. GPU-STREAM v2.0 extends the ben…