Ajeya Naithani
YOU?
Author Swipe
View article: Scalar Vector Runahead: Removing the Shackles of Indirect Memory Chains on In-Order Cores
Scalar Vector Runahead: Removing the Shackles of Indirect Memory Chains on In-Order Cores Open
Modern processors often face the memory wall as a bottleneck, an exacerbated problem for stall-on-use in-order cores. Despite this limitation, there is growing demand for energy-efficient in-order cores due to privacy and sustainability co…
View article: The Architectural Sustainability Indicator
The Architectural Sustainability Indicator Open
Computing devices are responsible for a significant fraction of the world's total carbon footprint. Designing sustainable systems is a challenging endeavor because of the huge design space, the complex objective function, and the inherent …
View article: Scalar Vector Runahead
Scalar Vector Runahead Open
Modern graph and database processing typically takes place on high-end servers in data centers. However, with growing concerns of data privacy, trustworthiness, and all-time connectivity, there has been a shift toward increased analytics p…
View article: Decoupled Vector Runahead for Prefetching Nested Memory-Access Chains
Decoupled Vector Runahead for Prefetching Nested Memory-Access Chains Open
Decoupled vector runahead (DVR) exploits massive amounts of memory-level parallelism to improve the performance of applications that feature indirect memory accesses by dynamically inferring loop bounds at runtime, recognizing striding loa…
View article: Decoupled Vector Runahead
Decoupled Vector Runahead Open
We present Decoupled Vector Runahead (DVR), an in-core prefetching technique, executing separately to the main application thread, that exploits massive amounts of memory-level parallelism to improve the performance of applications featuri…
View article: Vector Runahead for Indirect Memory Accesses
Vector Runahead for Indirect Memory Accesses Open
Vector runahead delivers extremely high memory-level parallelism even for the chains of dependent memory accesses with complex intermediate address computation, which conventional runahead techniques fundamentally cannot handle and, theref…
View article: The Forward Slice Core: A High-Performance, Yet Low-Complexity Microarchitecture
The Forward Slice Core: A High-Performance, Yet Low-Complexity Microarchitecture Open
Superscalar out-of-order cores deliver high performance at the cost of increased complexity and power budget. In-order cores, in contrast, are less complex and have a smaller power budget, but offer low performance. A processor architectur…
View article: VMT: Virtualized Multi-Threading for Accelerating Graph Workloads on Commodity Processors
VMT: Virtualized Multi-Threading for Accelerating Graph Workloads on Commodity Processors Open
[EN] Modern-day graph workloads operate on huge graphs through pointer chasing which leads to high last-level cache (LLC) miss rates and limited memory-level parallelism (MLP). Simultaneous Multi-Threading (SMT) effectively hides the memor…
View article: Vector Runahead
Vector Runahead Open
The memory wall places a significant limit on performance for many modern workloads. These applications feature complex chains of dependent, indirect memory accesses, which cannot be picked up by even the most advanced microarchitectural p…
View article: The Forward Slice Core Microarchitecture
The Forward Slice Core Microarchitecture Open
Superscalar out-of-order cores deliver high performance at the cost of increased complexity and power budget. In-order cores, in contrast, are less complex and have a smaller power budget, but offer low performance. A processor architectur…
View article: Precise Runahead Execution
Precise Runahead Execution Open
Runahead execution improves processor performance by accurately prefetching long-latency memory accesses. When a long-latency load causes the instruction window to fill up and halt the pipeline, the processor enters runahead mode and keeps…
View article: Precise Runahead Execution
Precise Runahead Execution Open
© 2019 IEEE. Personal use of this material is permitted. Permissíon from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertisíng or promotional purposes, cre…
View article: Optimizing Soft Error Reliability Through Scheduling on Heterogeneous Multicore Processors
Optimizing Soft Error Reliability Through Scheduling on Heterogeneous Multicore Processors Open
Reliability to soft errors is an increasingly important issue as technology continues to shrink. In this paper, we show that applications exhibit different reliability characteristics on big, high-performance cores versus small, power-effi…