Register file ≈ Register file
View article: ZombieLoad
ZombieLoad Open
In early 2018, Meltdown first showed how to read arbitrary kernel memory from user space by exploiting side-effects from transient instructions. While this attack has been mitigated through stronger isolation boundaries between user and ke…
View article
SoK: Shining Light on Shadow Stacks Open
Control-Flow Hijacking attacks are the dominant attack vector against C/C++ programs. Control-Flow Integrity (CFI) solutions mitigate these attacks on the forward edge, i.e., indirect calls through function pointers and virtual calls. Prot…
View article
Ara: A 1-GHz+ Scalable and Energy-Efficient RISC-V Vector Processor with Multiprecision Floating-Point Support in 22-nm FD-SOI Open
In this article, we present Ara, a 64-bit vector processor based on the version 0.5 draft of RISC-V's vector extension, implemented in GlobalFoundries 22FDX fully depleted silicon-on-insulator (FD-SOI) technology. Ara's microarchitecture i…
View article
Snitch: A Tiny Pseudo Dual-Issue Processor for Area and Energy Efficient Execution of Floating-Point Intensive Workloads Open
Data-parallel applications, such as data analytics, machine learning, and\nscientific computing, are placing an ever-growing demand on floating-point\noperations per second on emerging systems. With increasing integration density,\nthe que…
View article
ZombieLoad: Cross-Privilege-Boundary Data Sampling Open
In early 2018, Meltdown first showed how to read arbitrary kernel memory from user space by exploiting side-effects from transient instructions. While this attack has been mitigated through stronger isolation boundaries between user and ke…
View article
Vitruvius+: An Area-Efficient RISC-V Decoupled Vector Coprocessor for High Performance Computing Applications Open
The maturity level of RISC-V and the availability of domain-specific instruction set extensions, like vector processing, make RISC-V a good candidate for supporting the integration of specialized hardware in processor cores for the High Pe…
View article
Stream Semantic Registers: A Lightweight RISC-V ISA Extension Achieving Full Compute Utilization in Single-Issue Cores Open
Single-issue processor cores are very energy efficient but suffer from the von Neumann bottleneck, in that they must explicitly fetch and issue the loads/storse necessary to feed their ALU/FPU. Each instruction spent on moving data is a cy…
View article
A Survey of Techniques for Architecting and Managing GPU Register File Open
To support their massively-multithreaded architecture, GPUs use very large register file (RF) which has a capacity higher than even L1 and L2 caches. In total contrast, traditional CPUs use tiny RF and much larger caches to optimize latenc…
View article
DNN Dataflow Choice Is Overrated. Open
Many DNN accelerators have been proposed and built using different microarchitectures and program mappings. To fairly compare these different approaches, we modified the Halide compiler to produce hardware as well as CPU and GPU code, and …
View article
LTRF Open
Graphics Processing Units (GPUs) employ large register files to accommodate all active threads and accelerate context switching. Unfortunately, register files are a scalability bottleneck for future GPUs due to long access latency, high po…
View article
Precise Runahead Execution Open
Runahead execution improves processor performance by accurately prefetching long-latency memory accesses. When a long-latency load causes the instruction window to fill up and halt the pipeline, the processor enters runahead mode and keeps…
View article
CORF Open
The Register File (RF) in GPUs is a critical structure that maintains the state for thousands of threads that support the GPU processing model. The RF organization substantially affects the overall performance and the energy efficiency of …
View article
Stencil codes on a vector length agnostic architecture Open
Data-level parallelism is frequently ignored or underutilized. Achieved through vector/SIMD capabilities, it can provide substantial performance improvements on top of widely used techniques such as thread-level parallelism. However, manua…
View article
Linebacker Open
Modern GPUs suffer from cache contention due to the limited cache size that is shared across tens of concurrently running warps. To increase the per-warp cache size prior techniques proposed warp throttling which limits the number of activ…
View article
A survey of techniques for designing and managing CPU register file Open
Summary Processor register file (RF) is an important microarchitectural component used for storing operands and results of instructions. The design and operation of RF have crucial impact on the performance, energy efficiency, and reliabil…
View article
AVPP Open
Value prediction improves instruction level parallelism in superscalar processors by breaking true data dependencies. Although this technique can significantly improve overall performance, most of the state-of-the-art value prediction appr…
View article
Using Arm’s scalable vector extension on stencil codes Open
Data-level parallelism is frequently ignored or underutilized. Achieved through vector/SIMD capabilities, it can provide substantial performance improvements on top of widely used techniques such as thread-level parallelism. However, manua…
View article
Nwise and Pwise: 10T Radiation Hardened SRAM Cells for Space Applications With High Reliability Requirements Open
SRAM cells are widely used to design memory blocks of, e.g., caches, register files, and translation lookaside buffers. Depending on the SRAM application, the design requirements are different. For instance, in space applications, alongsid…
View article
All-Digital Energy-Constrained Controller for General-Purpose Accelerators and CPUs Open
Considering the energy-cap problem in batterypowered devices, DVFS and power gating represent the defacto state-of-the-art actuators. However, the limited margin to reduce the operating voltage, the impossibility to massively integrate suc…
View article
Cost effective physical register sharing Open
International audience
View article
A Framework for Automated and Controlled Floating-Point Accuracy Reduction in Graphics Applications on GPUs Open
Reducing the precision of floating-point values can improve performance and/or reduce energy expenditure in computer graphics, among other, applications. However, reducing the precision level of floating-point values in a controlled fashio…
View article
Slumber Open
The leakage power dissipation has become one of the major concerns with technology scaling. The GPGPU register file has grown in size over last decade in order to support the parallel execution of thousands of threads. Given that each thre…
View article
Hi-End: Hierarchical, Endurance-Aware STT-MRAM-Based Register File for Energy-Efficient GPUs Open
Modern Graphics Processing Units (GPUs) require large hardware resources for massive parallel thread executions. In particular, modern GPUs have a large register file composed of Static Random Access Memory (SRAM). Due to the high leakage …
View article
Estimating the Failures and Silent Errors Rates of CPUs Across ISAs and Microarchitectures Open
Silent data corruptions (SDCs) pose a significant challenge to the reliable operation of modern microprocessors. As the need for enhanced performance and reliability continues to grow, it becomes essential to gain insight into the potentia…
View article
Simty: generalized SIMT execution on RISC-V Open
International audience
View article
Persistent Processor Architecture Open
This paper presents PPA (Persistent Processor Architecture), simple microarchitectural support for lightweight yet performant whole-system persistence. PPA offers fully transparent crash consistency to all sorts of program covering the ent…
View article
Efficient Implementation of Many-Ported Memories by Using Standard-Cell Memory Approach Open
Multi-ported memories are widely used in many applications, such as for high-speed and high-performance parallel computations. While conventional SRAM-based memory macros are limited in both flexibility (e.g., to accommodate a large number…
View article
GOAT: GPU Outsourcing of Deep Learning Training With Asynchronous Probabilistic Integrity Verification Inside Trusted Execution Environment Open
Machine learning models based on Deep Neural Networks (DNNs) are increasingly deployed in a wide range of applications ranging from self-driving cars to COVID-19 treatment discovery. To support the computational power necessary to learn a …
View article
An Aging-Aware GPU Register File Design Based on Data Redundancy Open
"© 2019 IEEE. Personal use of this material is permitted. Permissíon from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertisíng or promotional purposes, cr…
View article
Ara: A 1-GHz+ Scalable and Energy-Efficient RISC-V Vector Processor With Multiprecision Floating-Point Support in 22-nm FD-SOI Open
In this paper, we present Ara, a 64-bit vector processor based on the version 0.5 draft of RISC-V's vector extension, implemented in GlobalFoundries 22FDX FD-SOI technology. Ara's microarchitecture is scalable, as it is composed of a set o…