Uniform memory access
View article: DRISA
DRISA Open
Data movement between the processing units and the memory in traditional von Neumann architecture is creating the "memory wall" problem. To bridge the gap, two approaches, the memory-rich processor (more on-chip memory) and the compute-cap…
View article
Can far memory improve job throughput? Open
As memory requirements grow, and advances in memory technology slow, the availability of sufficient main memory is increasingly the bottleneck in large compute clusters. One solution to this is memory disaggregation, where jobs can remotel…
View article
Clio: a hardware-software co-designed disaggregated memory system Open
Memory disaggregation has attracted great attention recently because of its benefits in efficient memory utilization and ease of management. So far, memory disaggregation research has all taken one of two approaches: building/emulating mem…
View article
In-Memory Data Parallel Processor Open
Recent developments in Non-Volatile Memories (NVMs) have opened up a new horizon for in-memory computing. Despite the significant performance gain offered by computational NVMs, previous works have relied on manual mapping of specialized k…
View article
Umpire: Application-focused management and coordination of complex hierarchical memory Open
Advanced architectures like Sierra provide a wide range of memory resources that must often be carefully controlled by the user. These resources have varying capacities, access timing rules, and visibility to different compute resources. A…
View article
A New Approach to Automatic Memory Banking using Trace-Based Address Mining Open
Recent years have seen an increased deployment of FPGAs as programmable accelerators for improving the performance and energy efficiency of compute-intensive applications. A well-known "secret sauce" of achieving highly efficient FPGA acce…
View article
An MIG-based compiler for programmable logic-in-memory architectures Open
Resistive memories have gained high research attention for enabling design of in-memory computing circuits and systems. We propose for the first time an automatic compilation methodology suited to a recently proposed computer architecture …
View article
On the Memory Underutilization: Exploring Disaggregated Memory on HPC Systems Open
Large-scale high-performance computing (HPC) systems consist of massive compute and memory resources tightly coupled in nodes. We perform a large-scale study of memory utilization on four production HPC clusters. Our results show that more…
View article
A Primer on Memory Consistency and Cache Coherence, Second Edition Open
Many modern computer systems, including homogeneous and heterogeneous architectures, support shared memory in hardware. In a shared memory system, each of the processor cores may read and write to a single shared address space. For a share…
View article
Contention-Aware Dynamic Memory Bandwidth Isolation with Predictability in COTS Multicores: An Avionics Case Study Open
Airbus is investigating COTS multicore platforms for safety-critical avionics applications, pursuing helicopter-style autonomous and electric aircraft. These aircraft need to be ultra-lightweight for future mobility in the urban city lands…
View article
Memory Sizing of a Scalable SRAM In-Memory Computing Tile Based Architecture Open
Modern computing applications require more and more data to be processed. Unfortunately, the trend in memory technologies does not scale as fast as the computing performances, leading to the so called memory wall. New architectures are cur…
View article
Disaggregated Cloud Memory with Elastic Block Management Open
With the growing importance of in-memory data processing, cloud service providers have launched large memory virtual machine services to accommodate memory intensive workloads. Such large memory services using low volume scaled-up machines…
View article
Partial Failure Resilient Memory Management System for (CXL-based) Distributed Shared Memory Open
The efficiency of distributed shared memory (DSM) has been greatly improved by recent hardware technologies. But, the difficulty of distributed memory management can still be a major obstacle to the democratization of DSM, especially when …
View article
NumaMMA Open
International audience
View article
Testing Computation-in-Memory Architectures Based on Emerging Memories Open
Today's computing architectures and device technologies are incapable of meeting the increasingly stringent demands on energy and performance posed by evolving applications. Therefore, alternative novel post-CMOS computing architectures ar…
View article
Reducing Data Movement on Large Shared Memory Systems by Exploiting Computation Dependencies Open
Shared memory systems are becoming increasingly complex as they typically integrate several storage devices. That brings different access latencies or bandwidth rates depending on the proximity between the cores where memory accesses are i…
View article
Survey on memory management techniques in heterogeneous computing systems Open
A major issue faced by data scientists today is how to scale up their processing infrastructure to meet the challenge of big data and high‐performance computing (HPC) workloads. With today's HPC domain, it is required to connect multiple g…
View article
Virtualizing Deep Neural Networks for Memory-Efficient Neural Network Design Open
The most widely used machine learning frameworks require users to carefully tune their memory usage so that the deep neural network (DNN) fits into the DRAM capacity of a GPU. This restriction hampers a researcher's flexibility to study di…
View article
A memory scheduling strategy for eliminating memory access interference in heterogeneous system Open
Multiple CPUs and GPUs are integrated on the same chip to share memory, and access requests between cores are interfering with each other. Memory requests from the GPU seriously interfere with the CPU memory access performance. Requests be…
View article
Automation in Distributed Shared Memory Testing for Multi-Processor Systems Open
This research paper explores the critical domain of automated testing for Distributed Shared Memory (DSM) systems in multi-processor environments. As the complexity of multi-core and distributed computing systems continues to grow, ensurin…
View article
Hardware Implementation and Analysis of Gen-Z Protocol for Memory-Centric Architecture Open
With the increase in memory-intensive applications, a memory-centric architecture has been proposed in which the central processing units (CPUs) access a pool of fabric-attached memory. This architecture eliminates the dependency of system…
View article
Empirical Memory-Access Cost Models in Multicore NUMA Architectures Open
Data location is of prime importance when scheduling tasks in a non-uniform memory access (NUMA) architecture. The characteristics of the NUMA architecture must be understood so tasks can be scheduled onto processors that are close to the …
View article
PIM-trie: A Skew-resistant Trie for Processing-in-Memory Open
Memory latency and bandwidth are significant bottlenecks in designing in-memory indexes. Processing-in-memory (PIM), an emerging hardware design approach, alleviates this problem by embedding processors in memory modules, enabling low-late…
View article
Effectively Prefetching Remote Memory with Leap Open
Memory disaggregation over RDMA can improve the performance of memory-constrained applications by replacing disk swapping with remote memory accesses. However, state-of-the-art memory disaggregation solutions still use data path components…
View article
Understanding object-level memory access patterns across the spectrum Open
Memory accesses limit the performance and scalability of countless applications. Many design and optimization efforts will benefit from an in-depth understanding of memory access behavior, which is not offered by extant access tracing and …
View article
An Efficient GPU Cache Architecture for Applications with Irregular Memory Access Patterns Open
GPUs provide high-bandwidth/low-latency on-chip shared memory and L1 cache to efficiently service a large number of concurrent memory requests. Specifically, concurrent memory requests accessing contiguous memory space are coalesced into w…
View article
Miss Penalty Aware Cache Replacement for Hybrid Memory Systems Open
Current DRAM-based memory systems face the scalability challenges in terms of memory density, energy consumption, and monetary cost. Hybrid memory architectures composed of emerging nonvolatile memory (NVM) and DRAM is a promising approach…
View article
CrypTag: Thwarting Physical and Logical Memory Vulnerabilities using Cryptographically Colored Memory Open
Memory vulnerabilities are a major threat to many computing systems. To\neffectively thwart spatial and temporal memory vulnerabilities, full logical\nmemory safety is required. However, current mitigation techniques for memory\nsafety are…
View article
Harnessing Integrated CPU-GPU System Memory for HPC: a first look into Grace Hopper Open
Memory management across discrete CPU and GPU physical memory is traditionally achieved through explicit GPU allocations and data copy or unified virtual memory. The Grace Hopper Superchip, for the first time, supports an integrated CPU-GP…
View article
Distributed-Memory FastFlow Building Blocks Open
We present the new distributed-memory run-time system (RTS) of the C++-based open-source structured parallel programming library FastFlow . The new RTS enables the execution of FastFlow shared-memory applications written using its Building…