Explanipedia

BLAZE: Exploiting Hybrid Parallelism and Size-customized Kernels to Accelerate BLASTP on GPUs Open

T. N. Vijaykumar · 2025

Efficient Sparse Processing-in-Memory Architecture (ESPIM) for Machine Learning Inference Open

Mingxuan He, Mithuna Thottethodi, T. N. Vijaykumar · 2024

Emerging machine learning (ML) models (e.g., transformers) involve memory pin bandwidth-bound matrix-vector (MV) computation in inference. By avoiding pin crossings, processing in memory (PIM) can improve performance and energy for pin-bou…

QED: Scalable Verification of Hardware Memory Consistency Open

Gokulan Ravi, Xiaokang Qiu, Mithuna Thottethodi, T. N. Vijaykumar · 2024

Memory consistency model (MCM) issues in out-of-order-issue microprocessor-based shared-memory systems are notoriously non-intuitive and a source of hardware design bugs. Prior hardware verification work is limited to in-order-issue proces…

Eureka: Efficient Tensor Cores for One-sided Unstructured Sparsity in DNN Inference Open

Ashish Gondimalla, Mithuna Thottethodi, T. N. Vijaykumar · 2023

Deep neural networks (DNNs), while enormously popular, continue to place ever higher compute demand for which GPUs provide specialized matrix multipliers called tensor cores. To reduce the compute demand via sparsity, Nvidia Ampere's tenso…

SafeBet: Secure, Simple, and Fast Speculative Execution Open

Conor James Green, Cole Nelson, Mithuna Thottethodi, T. N. Vijaykumar · 2023

Spectre attacks exploit microprocessor speculative execution to read and transmit forbidden data outside the attacker's trust domain and sandbox. Recent hardware schemes allow potentially-unsafe speculative accesses but prevent the secret'…

Occam: Optimal Data Reuse for Convolutional Neural Networks Open

Ashish Gondimalla, Jianqiao Liu, Mithuna Thottethodi, T. N. Vijaykumar · 2022

Convolutional neural networks (CNNs) are emerging as powerful tools for image processing in important commercial applications. We focus on the important problem of improving the latency of image recognition. While CNNs are highly amenable …

Toward Reliable and Efficient Exascale Computing. Open

Yung Ryn Choe, Rudolf Eigenmann, T. N. Vijaykumar, Seyong Lee, Vijay S. Pai , et al. · 2021

FastZ Open

Sree Charan Gundabolu, T. N. Vijaykumar, Mithuna Thottethodi · 2021

Recognizing the importance of whole genome alignment (WGA), the National Institutes for Health maintains LASTZ, a sequential WGA application. As genomic data grows, there is a compelling need for scalable, high-performance WGA. Unfortunate…

OCCAM: Optimal Data Reuse for Convolutional Neural Networks Open

Ashish Gondimalla, Jianqiao Liu, T. N. Vijaykumar, Mithuna Thottethodi · 2021

Convolutional neural networks (CNNs) are emerging as powerful tools for image processing in important commercial applications. We focus on the important problem of improving the latency of image recognition. CNNs' large data at each layer'…

Barrier-Free Large-Scale Sparse Tensor Accelerator (BARISTA) For\n Convolutional Neural Networks Open

Ashish Gondimalla, Sree Charan Gundabolu, T. N. Vijaykumar, Mithuna Thottethodi · 2021

Convolutional neural networks (CNNs) are emerging as powerful tools for\nvisual recognition. Recent architecture proposals for sparse CNNs exploit zeros\nin the feature maps and filters for performance and energy without losing\naccuracy. …

Barrier-Free Large-Scale Sparse Tensor Accelerator (BARISTA) For Convolutional Neural Networks Open

Ashish Gondimalla, Sree Charan Gundabolu, T. N. Vijaykumar, Mithuna Thottethodi · 2021

Convolutional neural networks (CNNs) are emerging as powerful tools for visual recognition. Recent architecture proposals for sparse CNNs exploit zeros in the feature maps and filters for performance and energy without losing accuracy. Spa…

Booster: An Accelerator for Gradient Boosting Decision Trees Open

Mingxuan He, T. N. Vijaykumar, Mithuna Thottethodi · 2020

We propose Booster, a novel accelerator for gradient boosting trees based on the unique characteristics of gradient boosting models. We observe that the dominant steps of gradient boosting training (accounting for 90-98% of training time) …

Attention-based Joint Detection of Object and Semantic Part Open

Keval Morabia, Jatin Arora, T. N. Vijaykumar · 2020

In this paper, we address the problem of joint detection of objects like dog and its semantic parts like face, leg, etc. Our model is created on top of two Faster-RCNN models that share their features to perform a novel Attention-based fea…

Network Interface Architecture for Remote Indirect Memory Access (RIMA) in Datacenters Open

Jiachen Xue, T. N. Vijaykumar, Mithuna Thottethodi · 2020

Remote Direct Memory Access (RDMA) fabrics such as InfiniBand and Converged Ethernet report latency shorter by a factor of 50 than TCP. As such, RDMA is a potential replacement for TCP in datacenters (DCs) running low-latency applications,…

Fast Congestion Control in RDMA-based Datacenter Networks Open

Jaichen Xue, Muhammad Usama Chaudhry, Balajee Vamanan, T. N. Vijaykumar, Mithuna Thottethodi · 2018

short-paper Free Access Share on Fast Congestion Control in RDMA-based Datacenter Networks Authors: Jaichen Xue Purdue University Purdue UniversityView Profile , Muhammad Usama Chaudhry University of Illinois at Chicago University of Illin…

Dart: Divide and Specialize for Fast Response to Congestion in RDMA-based Datacenter Networks Open

Jaichen Xue, Muhammad Usama Chaudhry, Balajee Vamanan, T. N. Vijaykumar, Mithuna Thottethodi · 2018

Though Remote Direct Memory Access (RDMA) promises to reduce datacenter network latencies significantly compared to TCP (e.g., 10x), end-to-end congestion control in the presence of incasts is a challenge. Targeting the full generality of …

Hydra: Leveraging Functional Slicing for Efficient Distributed SDN Controllers Open

Yiyang Chang, Ashkan Rezaei, Balajee Vamanan, Jahangir Hasan, Sanjay Rao , et al. · 2016

The conventional approach to scaling Software Defined Networking (SDN) controllers today is to partition switches based on network topology, with each partition being controlled by a single physical controller, running all SDN applications…

Achieving Causal Consistency under Partial Replication for Geo-distributed Cloud Storage Open

Tariq Mahmood, Shankaranarayanan Puzhavakath Narayanan, Sanjay Rao, T. N. Vijaykumar, Mithuna Thottethodi · 2016

Causal consistency has emerged as an attractive middle-ground to architecting cloud storage systems, as it allows for high availability and low latency, while supporting stronger-than-eventual-consistency semantics. However, causally-consi…

TimeTrader Open

Balajee Vamanan, Hamza Bin Sohail, Jahangir Hasan, T. N. Vijaykumar · 2015

Online Search (OLS) is a key component of many popular Internet services. Datacenters running OLS consume significant amounts of energy. However, reducing their energy is challenging due to their tight response time requirements. A key asp…

MigrantStore: Leveraging Virtual Memory in DRAM-PCM Memory Architecture Open

Hamza Bin Sohail, Balajee Vamanan, T. N. Vijaykumar · 2015

With the imminent slowing down of DRAM scaling, Phase Change Memory (PCM) is emerging as a lead alternative for main memory technology. While PCM achieves low energy due to various technology-specific advantages, PCM is significantly slowe…

TimeTrader: Exploiting Latency Tail to Save Datacenter Energy for On-line Data-Intensive Applications Open

Balajee Vamanan, Hamza Bin Sohail, Jahangir Hasan, T. N. Vijaykumar · 2015

Datacenters running on-line, data-intensive applications (OLDIs) consume significant amounts of energy. However, reducing their energy is challenging due to their tight response time requirements. A key aspect of OLDIs is that each user qu…

Stratified Online Sampling for Sound Approximation in MapReduce Open

Mithuna Thottethodi, T. N. Vijaykumar, Milind Kulkarni, Nitin Nitin · 2015

T. N. Vijaykumar YOU? Author Swipe