Michael J. Brim
YOU?
Author Swipe
View article: From Edge to HPC: Investigating Cross-Facility Data Streaming Architectures
From Edge to HPC: Investigating Cross-Facility Data Streaming Architectures Open
View article: From Edge to HPC: Investigating Cross-Facility Data Streaming Architectures
From Edge to HPC: Investigating Cross-Facility Data Streaming Architectures Open
In this paper, we investigate three cross-facility data streaming architectures, Direct Streaming (DTS), Proxied Streaming (PRS), and Managed Service Streaming (MSS). We examine their architectural variations in data flow paths and deploym…
View article: A Study on Messaging Trade-offs in Data Streaming for Scientific Workflows
A Study on Messaging Trade-offs in Data Streaming for Scientific Workflows Open
Memory-to-memory data streaming is essential for modern scientific workflows that require near real-time data analysis, experimental steering, and informed decision-making during experiment execution. It eliminates the latency bottlenecks …
View article: Secure API-Driven Research Automation to Accelerate Scientific Discovery
Secure API-Driven Research Automation to Accelerate Scientific Discovery Open
The Secure Scientific Service Mesh (S3M) provides API-driven infrastructure to accelerate scientific discovery through automated research workflows. By integrating near real-time streaming capabilities, intelligent workflow orchestration, …
View article: Enabling Seamless Transitions from Experimental to Production HPC for Interactive Workflows
Enabling Seamless Transitions from Experimental to Production HPC for Interactive Workflows Open
The evolving landscape of scientific computing requires seamless transitions from experimental to production HPC environments for interactive workflows. This paper presents a structured transition pathway developed at OLCF that bridges the…
View article: Lustre Unveiled: Evolution, Design, Advancements, and Current Trends
Lustre Unveiled: Evolution, Design, Advancements, and Current Trends Open
The Lustre filesystem serves as a vital element in high-performance parallel storage, meeting the rising demands of scientific, research, and enterprise environments. Widely deployed across HPC environments, ranging from small-scale applic…
View article: Empowering Scientific Innovation Through An Integrated Research Infrastructure: The Role of the Advanced Computing Ecosystem
Empowering Scientific Innovation Through An Integrated Research Infrastructure: The Role of the Advanced Computing Ecosystem Open
As the landscape of computational science evolves, the Department of Energy (DOE) is reimagining the roles of its large-scale computing facilities to meet emerging research challenges. The Integrated Research Infrastructure (IRI) program a…
View article: OLCF Test Harness
OLCF Test Harness Open
Acceptance and regression testing of a High Performance Computing (HPC) system requires an automated and reproducible framework and tool for running and logging results. Manually running tests across a system is labor intensive and prone t…
View article: Privacy Preserving Federated Learning for Advanced Scientific Ecosystems
Privacy Preserving Federated Learning for Advanced Scientific Ecosystems Open
We present a framework to provide privacy preserving (PP) federating learning (FL) across multiple computational and experimental facilities. This work joins the compute capabilities of National Energy Research Scientific Computing Center …
View article: OLCF’s Advanced Computing Ecosystem (ACE): FY24 Efforts for the DOE Integrated Research Infrastructure (IRI) Program
OLCF’s Advanced Computing Ecosystem (ACE): FY24 Efforts for the DOE Integrated Research Infrastructure (IRI) Program Open
This report highlights significant strides made by Oak Ridge National Laboratory’s Oak Ridge Leadership Computing Facility (OLCF) in advancing computational research and infrastructure. Through the Advanced Computing Ecosystem (ACE) strate…
View article: Scaling the Summit: Deploying the World's Fastest Supercomputer
Scaling the Summit: Deploying the World's Fastest Supercomputer Open
Summit, the latest flagship supercomputer deployed at Oak Ridge Leadership Computing Facility (OLCF), became the number one system in the Top500 list in June 2018 and remains in the top spot in the most recent edition of the list. An exten…
View article: A High-level Design for Bidirectional Data Streaming to High-Performance Computing Systems from External Science Facilities
A High-level Design for Bidirectional Data Streaming to High-Performance Computing Systems from External Science Facilities Open
Cutting-edge science is increasingly data-driven due to the emergence of scientific machine learning models that can guide scientists toward fruitful areas of exploration. Experimental science facilities such as light and neutron sources, …
View article: Leveraging Single-Page Applications for Seamless Scientific Workflows: DevSecOps Considerations
Leveraging Single-Page Applications for Seamless Scientific Workflows: DevSecOps Considerations Open
Single-page applications (SPAs) have become indispensable in modern frontend development, with widespread adoption in scientific applications. The process of creating a single-page web application development environment which accurately r…
View article: INTERSECT-SDK (Python)
INTERSECT-SDK (Python) Open
Interconnected Science Ecosystem - Software Development Kit (INTERSECT-SDK)
View article: Enabling Interconnected Science Workflows through an Adapter Approach
Enabling Interconnected Science Workflows through an Adapter Approach Open
The INTERSECT Software framework project aims to create an open federated library that connects, coordinates, and controls systems in the scientific domain. It features the Adapter, a flexible and extensible interface inspired by the Adapt…
View article: Best practices for documenting a scientific Python project
Best practices for documenting a scientific Python project Open
Documentation is a crucial component of software development that helps users with installation and usage of the software. Documentation also helps onboard new developers to a software project with contributing guidelines and API informati…
View article: Are We Witnessing the Spectre of an HPC Meltdown?
Are We Witnessing the Spectre of an HPC Meltdown? Open
View article: Frontier: Exploring Exascale
Frontier: Exploring Exascale Open
As the US Department of Energy (DOE) computing facilities began deploying petascale systems in 2008, DOE was already setting its sights on exascale. In that year, DARPA published a report on the feasibility of reaching exascale. The report…
View article: INTERSECT Architecture Specification: Microservice Architecture (V.0.9)
INTERSECT Architecture Specification: Microservice Architecture (V.0.9) Open
Oak Ridge National Laboratory (ORNL)’s Self-driven Experiments for Science / Interconnected Science Ecosystem (INTERSECT) architecture project, titled “An Open Federated Architecture for the Laboratory of the Future”, creates an open feder…
View article: UnifyFS: A User-level Shared File System for Unified Access to Distributed Local Storage
UnifyFS: A User-level Shared File System for Unified Access to Distributed Local Storage Open
We introduce UnifyFS, a user-level file system that aggregates node-local storage tiers available on high performance computing (HPC) systems and makes them available to HPC applications under a unified namespace. UnifyFS employs transpare…
View article: High Performance Computing Facility Operational Assessment 2022: Oak Ridge Leadership Computing Facility
High Performance Computing Facility Operational Assessment 2022: Oak Ridge Leadership Computing Facility Open
The Oak Ridge Leadership Computing Facility (OLCF) was established to accelerate scientific discovery by providing world-leading computational performance and advanced data infrastructure. As a US Department of Energy (DOE) Office of Scien…
View article: INTERSECT Architecture Specification: Microservice Architecture (V.0.5)
INTERSECT Architecture Specification: Microservice Architecture (V.0.5) Open
Oak Ridge National Laboratory (ORNL)’s Self-driven Experiments for Science / Interconnected Science Ecosystem (INTERSECT) architecture project, titled “An Open Federated Architecture for the Laboratory of the Future”, creates an open feder…
View article: Approaching the Final Frontier: Lessons Learned from the Deployment of HPE/Cray EX Spock and Crusher supercomputers
Approaching the Final Frontier: Lessons Learned from the Deployment of HPE/Cray EX Spock and Crusher supercomputers Open
View article: The INTERSECT Open Federated Architecture for the Laboratory of the Future
The INTERSECT Open Federated Architecture for the Laboratory of the Future Open
View article: Are we witnessing the spectre of an HPC meltdown?
Are we witnessing the spectre of an HPC meltdown? Open
Summary We measure and analyze the performance observed when running applications and benchmarks before and after the Meltdown and Spectre fixes have been applied to the Cray supercomputers and supporting systems at the Oak Ridge Leadershi…
View article: I/O load balancing for big data HPC applications
I/O load balancing for big data HPC applications Open
High Performance Computing (HPC) big data problems require efficient distributed storage systems. However, at scale, such storage systems often experience load imbalance and resource contention due to two factors: the bursty nature of scie…
View article: UNITY
UNITY Open
This paper describes the vision for UNITY, a new high-performance computing focused data storage abstraction that places the entire memory hierarchy, including both traditionally separated memory-and file-based data storage, into one stora…
View article: Proceedings of the 2015 International Workshop on the Lustre Ecosystem: Challenges and Opportunities
Proceedings of the 2015 International Workshop on the Lustre Ecosystem: Challenges and Opportunities Open
The Lustre parallel file system has been widely adopted by high-performance computing (HPC) centers as an effective system for managing large-scale storage resources. Lustre achieves unprecedented aggregate performance by parallelizing I/O…
View article: Evaluating Dynamic File Striping For Lustre
Evaluating Dynamic File Striping For Lustre Open
We define dynamic striping as the ability to assign different Lustre striping characteristics to contiguous segments of a file as it grows. In this paper, we evaluate the effects of dynamic striping using a watermark-based strategy where t…
View article: Monitoring Extreme-scale Lustre Toolkit
Monitoring Extreme-scale Lustre Toolkit Open
We discuss the design and ongoing development of the Monitoring Extreme-scale Lustre Toolkit (MELT), a unified Lustre performance monitoring and analysis infrastructure that provides continuous, low-overhead summary information on the heal…