T. N. Vijaykumar
YOU?
Author Swipe
View article: BLAZE: Exploiting Hybrid Parallelism and Size-customized Kernels to Accelerate BLASTP on GPUs
BLAZE: Exploiting Hybrid Parallelism and Size-customized Kernels to Accelerate BLASTP on GPUs Open
View article: Efficient Sparse Processing-in-Memory Architecture (ESPIM) for Machine Learning Inference
Efficient Sparse Processing-in-Memory Architecture (ESPIM) for Machine Learning Inference Open
Emerging machine learning (ML) models (e.g., transformers) involve memory pin bandwidth-bound matrix-vector (MV) computation in inference. By avoiding pin crossings, processing in memory (PIM) can improve performance and energy for pin-bou…
View article: QED: Scalable Verification of Hardware Memory Consistency
QED: Scalable Verification of Hardware Memory Consistency Open
Memory consistency model (MCM) issues in out-of-order-issue microprocessor-based shared-memory systems are notoriously non-intuitive and a source of hardware design bugs. Prior hardware verification work is limited to in-order-issue proces…
View article: Eureka: Efficient Tensor Cores for One-sided Unstructured Sparsity in DNN Inference
Eureka: Efficient Tensor Cores for One-sided Unstructured Sparsity in DNN Inference Open
Deep neural networks (DNNs), while enormously popular, continue to place ever higher compute demand for which GPUs provide specialized matrix multipliers called tensor cores. To reduce the compute demand via sparsity, Nvidia Ampere's tenso…
View article: SafeBet: Secure, Simple, and Fast Speculative Execution
SafeBet: Secure, Simple, and Fast Speculative Execution Open
Spectre attacks exploit microprocessor speculative execution to read and transmit forbidden data outside the attacker's trust domain and sandbox. Recent hardware schemes allow potentially-unsafe speculative accesses but prevent the secret'…
View article: Occam: Optimal Data Reuse for Convolutional Neural Networks
Occam: Optimal Data Reuse for Convolutional Neural Networks Open
Convolutional neural networks (CNNs) are emerging as powerful tools for image processing in important commercial applications. We focus on the important problem of improving the latency of image recognition. While CNNs are highly amenable …
View article: Toward Reliable and Efficient Exascale Computing.
Toward Reliable and Efficient Exascale Computing. Open
View article: FastZ
FastZ Open
Recognizing the importance of whole genome alignment (WGA), the National Institutes for Health maintains LASTZ, a sequential WGA application. As genomic data grows, there is a compelling need for scalable, high-performance WGA. Unfortunate…
View article: OCCAM: Optimal Data Reuse for Convolutional Neural Networks
OCCAM: Optimal Data Reuse for Convolutional Neural Networks Open
Convolutional neural networks (CNNs) are emerging as powerful tools for image processing in important commercial applications. We focus on the important problem of improving the latency of image recognition. CNNs' large data at each layer'…
View article: Barrier-Free Large-Scale Sparse Tensor Accelerator (BARISTA) For\n Convolutional Neural Networks
Barrier-Free Large-Scale Sparse Tensor Accelerator (BARISTA) For\n Convolutional Neural Networks Open
Convolutional neural networks (CNNs) are emerging as powerful tools for\nvisual recognition. Recent architecture proposals for sparse CNNs exploit zeros\nin the feature maps and filters for performance and energy without losing\naccuracy. …
View article: Barrier-Free Large-Scale Sparse Tensor Accelerator (BARISTA) For Convolutional Neural Networks
Barrier-Free Large-Scale Sparse Tensor Accelerator (BARISTA) For Convolutional Neural Networks Open
Convolutional neural networks (CNNs) are emerging as powerful tools for visual recognition. Recent architecture proposals for sparse CNNs exploit zeros in the feature maps and filters for performance and energy without losing accuracy. Spa…
View article: Booster: An Accelerator for Gradient Boosting Decision Trees
Booster: An Accelerator for Gradient Boosting Decision Trees Open
We propose Booster, a novel accelerator for gradient boosting trees based on the unique characteristics of gradient boosting models. We observe that the dominant steps of gradient boosting training (accounting for 90-98% of training time) …
View article: Attention-based Joint Detection of Object and Semantic Part
Attention-based Joint Detection of Object and Semantic Part Open
In this paper, we address the problem of joint detection of objects like dog and its semantic parts like face, leg, etc. Our model is created on top of two Faster-RCNN models that share their features to perform a novel Attention-based fea…
View article: Network Interface Architecture for Remote Indirect Memory Access (RIMA) in Datacenters
Network Interface Architecture for Remote Indirect Memory Access (RIMA) in Datacenters Open
Remote Direct Memory Access (RDMA) fabrics such as InfiniBand and Converged Ethernet report latency shorter by a factor of 50 than TCP. As such, RDMA is a potential replacement for TCP in datacenters (DCs) running low-latency applications,…
View article: Fast Congestion Control in RDMA-based Datacenter Networks
Fast Congestion Control in RDMA-based Datacenter Networks Open
short-paper Free Access Share on Fast Congestion Control in RDMA-based Datacenter Networks Authors: Jaichen Xue Purdue University Purdue UniversityView Profile , Muhammad Usama Chaudhry University of Illinois at Chicago University of Illin…
View article: Dart: Divide and Specialize for Fast Response to Congestion in RDMA-based Datacenter Networks
Dart: Divide and Specialize for Fast Response to Congestion in RDMA-based Datacenter Networks Open
Though Remote Direct Memory Access (RDMA) promises to reduce datacenter network latencies significantly compared to TCP (e.g., 10x), end-to-end congestion control in the presence of incasts is a challenge. Targeting the full generality of …
View article: Hydra: Leveraging Functional Slicing for Efficient Distributed SDN Controllers
Hydra: Leveraging Functional Slicing for Efficient Distributed SDN Controllers Open
The conventional approach to scaling Software Defined Networking (SDN) controllers today is to partition switches based on network topology, with each partition being controlled by a single physical controller, running all SDN applications…
View article: Achieving Causal Consistency under Partial Replication for Geo-distributed Cloud Storage
Achieving Causal Consistency under Partial Replication for Geo-distributed Cloud Storage Open
Causal consistency has emerged as an attractive middle-ground to architecting cloud storage systems, as it allows for high availability and low latency, while supporting stronger-than-eventual-consistency semantics. However, causally-consi…
View article: TimeTrader
TimeTrader Open
Online Search (OLS) is a key component of many popular Internet services. Datacenters running OLS consume significant amounts of energy. However, reducing their energy is challenging due to their tight response time requirements. A key asp…
View article: MigrantStore: Leveraging Virtual Memory in DRAM-PCM Memory Architecture
MigrantStore: Leveraging Virtual Memory in DRAM-PCM Memory Architecture Open
With the imminent slowing down of DRAM scaling, Phase Change Memory (PCM) is emerging as a lead alternative for main memory technology. While PCM achieves low energy due to various technology-specific advantages, PCM is significantly slowe…
View article: TimeTrader: Exploiting Latency Tail to Save Datacenter Energy for On-line Data-Intensive Applications
TimeTrader: Exploiting Latency Tail to Save Datacenter Energy for On-line Data-Intensive Applications Open
Datacenters running on-line, data-intensive applications (OLDIs) consume significant amounts of energy. However, reducing their energy is challenging due to their tight response time requirements. A key aspect of OLDIs is that each user qu…
View article: Stratified Online Sampling for Sound Approximation in MapReduce
Stratified Online Sampling for Sound Approximation in MapReduce Open