Explanipedia

ZombieLoad Open

Michael Schwarz, Moritz Lipp, Daniel Moghimi, Jo Van Bulck, Julian Stecklina , et al. · 2019

Computer science Mathematics

In early 2018, Meltdown first showed how to read arbitrary kernel memory from user space by exploiting side-effects from transient instructions. While this attack has been mitigated through stronger isolation boundaries between user and ke…

SoK: Shining Light on Shadow Stacks Open

Nathan Burow, Xinping Zhang, Mathias Payer · 2019

Computer science Psychology

Control-Flow Hijacking attacks are the dominant attack vector against C/C++ programs. Control-Flow Integrity (CFI) solutions mitigate these attacks on the forward edge, i.e., indirect calls through function pointers and virtual calls. Prot…

Ara: A 1-GHz+ Scalable and Energy-Efficient RISC-V Vector Processor with Multiprecision Floating-Point Support in 22-nm FD-SOI Open

Matheus Cavalcante, Fabian Schuiki, Florian Zaruba, Michael Schaffner, Luca Benini · 2020

Computer science Mathematics Physics

In this article, we present Ara, a 64-bit vector processor based on the version 0.5 draft of RISC-V's vector extension, implemented in GlobalFoundries 22FDX fully depleted silicon-on-insulator (FD-SOI) technology. Ara's microarchitecture i…

Snitch: A Tiny Pseudo Dual-Issue Processor for Area and Energy Efficient Execution of Floating-Point Intensive Workloads Open

Florian Zaruba, Fabian Schuiki, Torsten Hoefler, Luca Benini · 2020

Computer science Mathematics Engineering

Data-parallel applications, such as data analytics, machine learning, and\nscientific computing, are placing an ever-growing demand on floating-point\noperations per second on emerging systems. With increasing integration density,\nthe que…

ZombieLoad: Cross-Privilege-Boundary Data Sampling Open

Michael Schwarz, Moritz Lipp, Daniel Moghimi, Jo Van Bulck, Julian Stecklina , et al. · 2019

Computer science

In early 2018, Meltdown first showed how to read arbitrary kernel memory from user space by exploiting side-effects from transient instructions. While this attack has been mitigated through stronger isolation boundaries between user and ke…

Vitruvius+: An Area-Efficient RISC-V Decoupled Vector Coprocessor for High Performance Computing Applications Open

Francesco Minervini, Oscar Palomar, Osman Ünsal, Enrico Reggiani, Josue V. Quiroga , et al. · 2022

Computer science

The maturity level of RISC-V and the availability of domain-specific instruction set extensions, like vector processing, make RISC-V a good candidate for supporting the integration of specialized hardware in processor cores for the High Pe…

Stream Semantic Registers: A Lightweight RISC-V ISA Extension Achieving Full Compute Utilization in Single-Issue Cores Open

Fabian Schuiki, Florian Zaruba, Torsten Hoefler, Luca Benini · 2021

Computer science

Single-issue processor cores are very energy efficient but suffer from the von Neumann bottleneck, in that they must explicitly fetch and issue the loads/storse necessary to feed their ALU/FPU. Each instruction spent on moving data is a cy…

A Survey of Techniques for Architecting and Managing GPU Register File Open

Sparsh Mittal · 2016

Computer science Art Engineering

To support their massively-multithreaded architecture, GPUs use very large register file (RF) which has a capacity higher than even L1 and L2 caches. In total contrast, traditional CPUs use tiny RF and much larger caches to optimize latenc…

DNN Dataflow Choice Is Overrated. Open

Xuan Yang, Mingyu Gao, Jing Pu, Ankita Nayak, Qiaoyi Liu , et al. · 2018

Computer science Economics Engineering

Many DNN accelerators have been proposed and built using different microarchitectures and program mappings. To fairly compare these different approaches, we modified the Halide compiler to produce hardware as well as CPU and GPU code, and …

LTRF Open

Mohammad Sadrosadati, Amirhossein Mirhosseini, Seyed Borna Ehsani, Hamid Sarbazi‐Azad, Mario Drumond , et al. · 2018

Computer science

Graphics Processing Units (GPUs) employ large register files to accommodate all active threads and accelerate context switching. Unfortunately, register files are a scalability bottleneck for future GPUs due to long access latency, high po…

Precise Runahead Execution Open

Ajeya Naithani, Josué Feliu, Almutaz Adileh, Lieven Eeckhout · 2020

Computer science Engineering

Runahead execution improves processor performance by accurately prefetching long-latency memory accesses. When a long-latency load causes the instruction window to fill up and halt the pipeline, the processor enters runahead mode and keeps…

CORF Open

Hodjat Asghari Esfeden, Farzad Khorasani, Hyeran Jeon, Daniel Wong, Nael Abu‐Ghazaleh · 2019

Computer science Philosophy

The Register File (RF) in GPUs is a critical structure that maintains the state for thousands of threads that support the GPU processing model. The RF organization substantially affects the overall performance and the energy efficiency of …

Stencil codes on a vector length agnostic architecture Open

Adrià Armejach, Helena Caminal, Juan M. Cebrián, R. Gonzalez-Alberquilla, Chris Adeniyi-Jones , et al. · 2018

Computer science

Data-level parallelism is frequently ignored or underutilized. Achieved through vector/SIMD capabilities, it can provide substantial performance improvements on top of widely used techniques such as thread-level parallelism. However, manua…

Linebacker Open

Yunho Oh, Gunjae Koo, Murali Annavaram, Won Woo Ro · 2019

Computer science

Modern GPUs suffer from cache contention due to the limited cache size that is shared across tens of concurrently running warps. To increase the per-warp cache size prior techniques proposed warp throttling which limits the number of activ…

A survey of techniques for designing and managing CPU register file Open

Sparsh Mittal · 2016

Computer science Physics Philosophy

Summary Processor register file (RF) is an important microarchitectural component used for storing operands and results of instructions. The design and operation of RF have crucial impact on the performance, energy efficiency, and reliabil…

AVPP Open

Lois Orosa, Rodolfo Azevedo, Onur Mutlu · 2018

Computer science

Value prediction improves instruction level parallelism in superscalar processors by breaking true data dependencies. Although this technique can significantly improve overall performance, most of the state-of-the-art value prediction appr…

Using Arm’s scalable vector extension on stencil codes Open

Adrià Armejach, Helena Caminal, Juan M. Cebrián, Rubén Langarita, R. Gonzalez-Alberquilla , et al. · 2019

Computer science

Data-level parallelism is frequently ignored or underutilized. Achieved through vector/SIMD capabilities, it can provide substantial performance improvements on top of widely used techniques such as thread-level parallelism. However, manua…

Nwise and Pwise: 10T Radiation Hardened SRAM Cells for Space Applications With High Reliability Requirements Open

Azam Seyedi, Snorre Aunet, Per Gunnar Kjeldsberg · 2022

Computer science Engineering Physics

SRAM cells are widely used to design memory blocks of, e.g., caches, register files, and translation lookaside buffers. Depending on the SRAM application, the design requirements are different. For instance, in space applications, alongsid…

All-Digital Energy-Constrained Controller for General-Purpose Accelerators and CPUs Open

Davide Zoni, Luca Cremona, William Fornaciari · 2019

Computer science Engineering

Considering the energy-cap problem in batterypowered devices, DVFS and power gating represent the defacto state-of-the-art actuators. However, the limited margin to reduce the operating voltage, the impossibility to massively integrate suc…

Cost effective physical register sharing Open

Arthur Pérais, André Seznec · 2016

Computer science Philosophy

International audience

A Framework for Automated and Controlled Floating-Point Accuracy Reduction in Graphics Applications on GPUs Open

Alexandra Angerd, Erik Sintorn, Per Stenström · 2017

Computer science Mathematics

Reducing the precision of floating-point values can improve performance and/or reduce energy expenditure in computer graphics, among other, applications. However, reducing the precision level of floating-point values in a controlled fashio…

Slumber Open

Devashree Tripathy, Hadi Zamani, Debiprasanna Sahoo, Laxmi N. Bhuyan, Manoranjan Satpathy · 2020

Computer science Engineering Economics

The leakage power dissipation has become one of the major concerns with technology scaling. The GPGPU register file has grown in size over last decade in order to support the parallel execution of thousands of threads. Given that each thre…

Hi-End: Hierarchical, Endurance-Aware STT-MRAM-Based Register File for Energy-Efficient GPUs Open

Won Jeon, Jun Hyun Park, Yoonsoo Kim, Gunjae Koo, Won Woo Ro · 2020

Computer science Engineering Philosophy

Modern Graphics Processing Units (GPUs) require large hardware resources for massive parallel thread executions. In particular, modern GPUs have a large register file composed of Static Random Access Memory (SRAM). Due to the high leakage …

Estimating the Failures and Silent Errors Rates of CPUs Across ISAs and Microarchitectures Open

Dimitris Gizopoulos, George N. Papadimitriou, Odysseas Chatzopoulos · 2023

Computer science Biology Physics

Silent data corruptions (SDCs) pose a significant challenge to the reliable operation of modern microprocessors. As the need for enhanced performance and reliability continues to grow, it becomes essential to gain insight into the potentia…

Simty: generalized SIMT execution on RISC-V Open

Caroline Collange · 2017

Computer science Art

International audience

Persistent Processor Architecture Open

Jianping Zeng, Jungi Jeong, Changhee Jung · 2023

Computer science Art

This paper presents PPA (Persistent Processor Architecture), simple microarchitectural support for lightweight yet performant whole-system persistence. PPA offers fully transparent crash consistency to all sorts of program covering the ent…

Efficient Implementation of Many-Ported Memories by Using Standard-Cell Memory Approach Open

Hanan Marinberg, Esteban Garzón, Tzachi Noy, Marco Lanuzza, Adam Teman · 2023

Computer science Mathematics Geography

Multi-ported memories are widely used in many applications, such as for high-speed and high-performance parallel computations. While conventional SRAM-based memory macros are limited in both flexibility (e.g., to accommodate a large number…

GOAT: GPU Outsourcing of Deep Learning Training With Asynchronous Probabilistic Integrity Verification Inside Trusted Execution Environment Open

Aref Asvadishirehjini, Murat Kantarcıoğlu, Bradley Malin · 2020

Computer science Materials science Political science

Machine learning models based on Deep Neural Networks (DNNs) are increasingly deployed in a wide range of applications ranging from self-driving cars to COVID-19 treatment discovery. To support the computational power necessary to learn a …

An Aging-Aware GPU Register File Design Based on Data Redundancy Open

Alejandro Valero, Francisco Candel, Darío Suárez Gracia, Salvador Petit, Julio Sahuquillo · 2018

Computer science Engineering

"© 2019 IEEE. Personal use of this material is permitted. Permissíon from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertisíng or promotional purposes, cr…

Ara: A 1-GHz+ Scalable and Energy-Efficient RISC-V Vector Processor With Multiprecision Floating-Point Support in 22-nm FD-SOI Open

Matheus Cavalcante, Fabian Schuiki, Florian Zaruba, Michael Schaffner, Luca Benini · 2019

Computer science Mathematics Physics

In this paper, we present Ara, a 64-bit vector processor based on the version 0.5 draft of RISC-V's vector extension, implemented in GlobalFoundries 22FDX FD-SOI technology. Ara's microarchitecture is scalable, as it is composed of a set o…

Register file ≈ Register file