Explanipedia

Second-level Caches: Not for Instructions Open

Muhammad Hassan, Chang Hyun Park, David Black-Schaffer · 2025

Growing instruction footprints are straining processor front-ends, increasing fetch latency, and causing pipeline stalls. The universal approach to addressing this has been keeping instructions in each level of the cache hierarchy, but a p…

Mark–Scavenge: Waiting for Trash to Take Itself Out Open

Jonas Norlinder, Erik Österlund, David Black-Schaffer, Tobias Wrigstad · 2024

Art History

Moving garbage collectors (GCs) typically free memory by evacuating live objects in order to reclaim contiguous memory regions. Evacuation is typically done either during tracing (scavenging), or after tracing when identification of live o…

Mutator-Driven Object Placement using Load Barriers Open

Jonas Norlinder, Albert Mingkun Yang, David Black-Schaffer, Tobias Wrigstad · 2024

Computer science

Object placement impacts cache utilisation, which is itself critical for performance. Managed languages offer fewer tools than unmanaged languages in the way of controlling object placement due to the abstract view of memory. On the other …

Protean: Resource-efficient Instruction Prefetching Open

Muhammad Hassan, Chang Hyun Park, David Black-Schaffer · 2023

Computer science Geology

Increases in code footprint and control flow complexity have made low-latency instruction fetch challenging. Dedicated Instruction Prefetchers (DIPs) can provide performance gains (up to 5%) for a subset of applications that are poorly ser…

Large-scale Graph Processing on Commodity Systems: Understanding and Mitigating the Impact of Swapping Open

Alireza Haddadi, David Black-Schaffer, Chang Hyun Park · 2023

Computer science Business Physics

Graph workloads are critical in many areas. Unfortunately, graph sizes have been increasing faster than DRAM capacity. As a result, large-scale graph processing necessarily falls back to virtual memory paging, resulting in tremendous perfo…

Exploring the Latency Sensitivity of Cache Replacement Policies Open

Ahmed Nematallah, Chang Hyun Park, David Black-Schaffer · 2023

Computer science Engineering

With DRAM latencies increasing relative to CPU speeds, the performance of caches has become more important. This has led to increasingly sophisticated replacement policies that require complex calculations to update their replacement metad…

Faster Functional Warming with Cache Merging Open

Gustaf Borgström, Christian Rohner, David Black-Schaffer · 2023

Computer science Geology Medicine

Smarts-like sampled hardware simulation techniques achieve good accuracy by simulating many small portions of an application in detail. However, while this reduces the simulation time, it results in extensive cache warming times, as each o…

Dependence-aware Slice Execution to Boost MLP in Slice-out-of-order Cores Open

Rakesh Kumar, Mehdi Alipour, David Black-Schaffer · 2022

Computer science Engineering Mathematics

Exploiting memory-level parallelism (MLP) is crucial to hide long memory and last-level cache access latencies. While out-of-order (OoO) cores, and techniques building on them, are effective at exploiting MLP, they deliver poor energy effi…

Every walk’s a hit: making page walks single-access cache hits Open

Chang Hyun Park, Ilias Vougioukas, Andreas Sandberg, David Black-Schaffer · 2022

Computer science Geography

As memory capacity has outstripped TLB coverage, large data applications suffer from frequent page table walks. We investigate two complementary techniques for addressing this cost: reducing the number of accesses required and reducing the…

Freeway to Memory Level Parallelism in Slice-Out-of-Order Cores Open

Rakesh Kumar, Mehdi Alipour, David Black-Schaffer · 2022

Computer science Engineering Mathematics

Exploiting memory level parallelism (MLP) is crucial to hide long memory and last level cache access latencies. While out-of-order (OoO) cores, and techniques building on them, are effective at exploiting MLP, they deliver poor energy effi…

Early Address Prediction Open

Ricardo N. Alves, Stefanos Kaxiras, David Black-Schaffer · 2021

Computer science Biology

Achieving low load-to-use latency with low energy and storage overheads is critical for performance. Existing techniques either prefetch into the pipeline (via address prediction and validation) or provide data reuse in the pipeline (via r…

A Reusable Characterization of the Memory System Behavior of SPEC2017 and SPEC2006 Open

Muhammad Hassan, Chang Hyun Park, David Black-Schaffer · 2021

Computer science Materials science Geography

The SPEC CPU Benchmarks are used extensively for evaluating and comparing improvements to computer systems. This ubiquity makes characterization critical for researchers to understand the bottlenecks the benchmarks do and do not expose and…

Raw-Data: A Reusable Characterization Of The Memory System Behavior Of SPEC 2017 And SPEC 2006 Open

Muhammad Hassan, Chang Hyun Park, David Black-Schaffer · 2020

Computer science Physics

This dataset accompanies the ISPASS 2020 extended abstract: Architecturally-independent and time-based characterization of SPEC CPU 2017 and TACO paper: A Reusable Characterization of the Memory System Behavior of SPEC2017 and SPEC2006. In…

Page Tables: Keeping them Flat and Hot (Cached) Open

Chang Hyun Park, Ilias Vougioukas, Andreas Sandberg, David Black-Schaffer · 2020

Computer science

As memory capacity has outstripped TLB coverage, large data applications suffer from frequent page table walks. We investigate two complementary techniques for addressing this cost: reducing the number of accesses required and reducing the…

Architecturally-Independent and Time-Based Characterization of SPEC CPU 2017 Open

Muhammad Hassan, Chang Hyun Park, David Black-Schaffer · 2020

Computer science Materials science

Characterizing the memory behaviour of SPEC CPU benchmarks is critical to analyze bottlenecks in the execution. Unfortunately, most prior characterizations are tied to a particular system (e.g., via performance counters, fixed configuratio…

Modeling and optimizing NUMA effects and prefetching with machine learning Open

Isaac Sánchez Barrera, David Black-Schaffer, Marc Casas, Miquel Moretó, Anastasiia Stupnikova , et al. · 2020

Computer science

Both NUMA thread/data placement and hardware prefetcher configuration have significant impacts on HPC performance. Optimizing both together leads to a large and complex design space that has previously been impractical to explore at runtim…

Perforated Page: Supporting Fragmented Memory Allocation for Large Pages Open

Chang Hyun Park, Sang-Hoon Cha, Bo-Kyeong Kim, Youngjin Kwon, David Black-Schaffer , et al. · 2020

Computer science

The availability of large pages has dramatically improved the efficiency of address translation for applications that use large contiguous regions of memory. However, large pages can be difficult to allocate due to fragmented memory, non-m…

Efficient thread/page/parallelism autotuning for NUMA systems Open

Mihail Popov, Alexandra Jimborean, David Black-Schaffer · 2019

Computer science

Current multi-socket systems have complex memory hierarchies with significant Non-Uniform Memory Access (NUMA) effects: memory performance depends on the location of the data and the thread. This complexity means that thread- and data-mapp…

Filter caching for free Open

Ricardo N. Alves, Alberto Ros, David Black-Schaffer, Stefanos Kaxiras · 2019

Computer science

Modern processors contain store-buffers to allow stores to retire under a miss, thus hiding store-miss latency. The store-buffer needs to be large (for performance) and searched on every load (for correctness), thereby making it a costly s…

FIFOrder MicroArchitecture: Ready-Aware Instruction Scheduling for OoO Processors Open

Mehdi Alipour, Rakesh Kumar, Stefanos Kaxiras, David Black-Schaffer · 2019

Computer science Engineering

The number of instructions a processor's instruction queue can examine (depth) and the number it can issue together (width) determine its ability to take advantage of the ILP in an application. Unfortunately, increasing either the width or…

Freeway: Maximizing MLP for Slice-Out-of-Order Execution Open

Rakesh Kumar, Mehdi Alipour, David Black-Schaffer · 2019

Computer science Engineering Mathematics

Exploiting memory level parallelism (MLP) is crucial to hide long memory and last level cache access latencies. While out-of-order (OoO) cores, and techniques building on them, are effective at exploiting MLP, they deliver poor energy effi…

Minimizing Replay under Way-Prediction Open

Ricardo N. Alves, Stefanos Kaxiras, David Black-Schaffer · 2019

Computer science

Way-predictors are effective at reducing dynamic cache energy by reducing the number of ways accessed, but introduce additional latency for incorrect way-predictions. While previous work has studied the impact of the increased latency for …

Maximizing Limited Resources: a Limit-Based Study and Taxonomy of Out-of-Order Commit Open

Mehdi Alipour, Trevor E. Carlson, David Black-Schaffer, Stefanos Kaxiras · 2018

Computer science Economics

Out-of-order execution is essential for high performance, general-purpose computation, as it can find and execute useful work instead of stalling. However, it is typically limited by the requirement of visibly sequential, atomic instructio…

Behind the Scenes: Memory Analysis of Graphical Workloads on Tile-Based GPUs Open

Germán Ceballos, Andreas Sembrant, Trevor E. Carlson, David Black-Schaffer · 2018

Computer science Art

Graphics rendering is a complex multi-step process whose data demands typically dominate memory system design in SoCs. GPUs create images by merging many simpler scenes for each frame. For performance, scenes are tiled into parallel tasks …

Understanding the interplay between task scheduling, memory and performance Open

Germán Ceballos, Erik Hägersten, David Black-Schaffer · 2017

Computer science Economics Mathematics

New programming models have been introduced to aid the programmer dealing with the complexity of large-scale systems, simplifying the coding process and making applications more scalable. Task-based programming is one example that became p…

Exploring Scheduling Effects on Task Performance with TaskInsight Open

Germán Ceballos, Andra Hugo, Erik Hägersten, David Black-Schaffer · 2017

Computer science Biology Economics

The complex memory hierarchies of nowadays machines make it very difficult to estimate the execution time of the tasks as depending on where the data is placed in memory, tasks of the same type may end up having different performance. Mult…

TaskInsight Open

Germán Ceballos, Thomas Grass, Andra Hugo, David Black-Schaffer · 2017

Computer science Philosophy Economics

Recent scheduling heuristics for task-based applications have managed to improve their by taking into account memory-related properties such as data locality and cache sharing. However, there is still a general lack of tools that can provi…

Adaptive Cache Warming for Faster Simulations Open

Gustaf Borgström, Andreas Sembrant, David Black-Schaffer · 2017

Computer science

The use of hardware-based virtualization allows modern simulators to very quickly fast-forward between sample points and regions of interest. This dramatically reduces the simulation time compared to traditional functional forwarding. Howe…

Characterizing Task Scheduling Performance Based on Data Reuse Open

Germán Ceballos, Thomas Grass, David Black-Schaffer, Andra Hugo · 2016

Computer science Engineering

Through the past years, several scheduling heuristics were introduced to improve the performance of task-based ap-plications, with schedulers increasingly becoming aware of memory-related bottlenec ...

Spatial and Temporal Cache Sharing Analysis in Tasks Open

Germán Ceballos, David Black-Schaffer · 2016

Computer science

Proceedings of the First PhD Symposium on Sustainable Ultrascale\n\t\t\t\t Computing Systems (NESUS PhD 2016) Timisoara, Romania. February 8-11, 2016.

David Black-Schaffer YOU? Author Swipe