Chita R. Das
YOU?
Author Swipe
View article: Pushing the Performance Envelope of DNN-based Recommendation Systems Inference on GPUs
Pushing the Performance Envelope of DNN-based Recommendation Systems Inference on GPUs Open
Personalized recommendation is a ubiquitous application on the internet, with many industries and hyperscalers extensively leveraging Deep Learning Recommendation Models (DLRMs) for their personalization needs (like ad serving or movie sug…
View article: Salient Store: Enabling Smart Storage for Continuous Learning Edge Servers
Salient Store: Enabling Smart Storage for Continuous Learning Edge Servers Open
As continuous learning based video analytics continue to evolve, the role of efficient edge servers in efficiently managing vast and dynamic datasets is becoming increasingly crucial. Unlike their compute architecture, storage and archival…
View article: Synergistic and Efficient Edge-Host Communication for Energy Harvesting Wireless Sensor Networks
Synergistic and Efficient Edge-Host Communication for Energy Harvesting Wireless Sensor Networks Open
There is an increasing demand for intelligent processing on ultra-low-power internet of things (IoT) device. Recent works have shown substantial efficiency boosts by executing inferences directly on the IoT device (node) rather than transm…
View article: Revisiting DNN Training for Intermittently-Powered Energy-Harvesting Micro-Computers
Revisiting DNN Training for Intermittently-Powered Energy-Harvesting Micro-Computers Open
The deployment of Deep Neural Networks in energy-constrained environments, such as Energy Harvesting Wireless Sensor Networks, presents unique challenges, primarily due to the intermittent nature of power availability. To address these cha…
View article: GPU Cluster Scheduling for Network-Sensitive Deep Learning
GPU Cluster Scheduling for Network-Sensitive Deep Learning Open
We propose a novel GPU-cluster scheduler for distributed DL (DDL) workloads that enables proximity based consolidation of GPU resources based on the DDL jobs' sensitivities to the anticipated communication-network delays. Our scheduler con…
View article: Analysis of Distributed Deep Learning in the Cloud
Analysis of Distributed Deep Learning in the Cloud Open
We aim to resolve this problem by introducing a comprehensive distributed deep learning (DDL) profiler, which can determine the various execution "stalls" that DDL suffers from while running on a public cloud. We have implemented the profi…
View article: Structured in Space, Randomized in Time: Leveraging Dropout in RNNs for Efficient Training
Structured in Space, Randomized in Time: Leveraging Dropout in RNNs for Efficient Training Open
Recurrent Neural Networks (RNNs), more specifically their Long Short-Term Memory (LSTM) variants, have been widely used as a deep learning tool for tackling sequence-based learning tasks in text and speech. Training of such LSTM applicatio…
View article: Kraken
Kraken Open
The growing popularity of microservices has led to the proliferation of online cloud service-based applications, which are typically modelled as Directed Acyclic Graphs (DAGs) comprising of tens to hundreds of microservices. The vast major…
View article: Exploiting Activation based Gradient Output Sparsity to Accelerate Backpropagation in CNNs
Exploiting Activation based Gradient Output Sparsity to Accelerate Backpropagation in CNNs Open
Machine/deep-learning (ML/DL) based techniques are emerging as a driving force behind many cutting-edge technologies, achieving high accuracy on computer vision workloads such as image classification and object detection. However, training…
View article: Structured in Space, Randomized in Time: Leveraging Dropout in RNNs for Efficient Training
Structured in Space, Randomized in Time: Leveraging Dropout in RNNs for Efficient Training Open
Recurrent Neural Networks (RNNs), more specifically their Long Short-Term Memory (LSTM) variants, have been widely used as a deep learning tool for tackling sequence-based learning tasks in text and speech. Training of such LSTM applicatio…
View article: Cocktail: Leveraging Ensemble Learning for Optimized Model Serving in Public Cloud
Cocktail: Leveraging Ensemble Learning for Optimized Model Serving in Public Cloud Open
With a growing demand for adopting ML models for a varietyof application services, it is vital that the frameworks servingthese models are capable of delivering highly accurate predic-tions with minimal latency along with reduced deploymen…
View article: CASH: A Credit Aware Scheduling for Public Cloud Platforms
CASH: A Credit Aware Scheduling for Public Cloud Platforms Open
The public cloud offers a myriad of services which allows its tenants to process large scale big data in a flexible, easy and cost effective manner. Tenants generally use large scale data processing frameworks such as MapReduce, Tez, Spark…
View article: Fifer: Tackling Underutilization in the Serverless Era
Fifer: Tackling Underutilization in the Serverless Era Open
Datacenters are witnessing a rapid surge in the adoption of serverless functions for microservices-based applications. A vast majority of these microservices typically span less than a second, have strict SLO requirements, and are chained …
View article: Towards Designing a Self-Managed Machine Learning Inference Serving System inPublic Cloud
Towards Designing a Self-Managed Machine Learning Inference Serving System inPublic Cloud Open
We are witnessing an increasing trend towardsusing Machine Learning (ML) based prediction systems, span-ning across different application domains, including productrecommendation systems, personal assistant devices, facialrecognition, etc.…
View article: Multiverse: Dynamic VM Provisioning for Virtualized High Performance\n Computing Clusters
Multiverse: Dynamic VM Provisioning for Virtualized High Performance\n Computing Clusters Open
Traditionally, HPC workloads have been deployed in bare-metal clusters; but\nthe advances in virtualization have led the pathway for these workloads to be\ndeployed in virtualized clusters. However, HPC cluster administrators/providers\nst…
View article: Multiverse: Dynamic VM Provisioning for Virtualized High Performance Computing Clusters
Multiverse: Dynamic VM Provisioning for Virtualized High Performance Computing Clusters Open
Traditionally, HPC workloads have been deployed in bare-metal clusters; but the advances in virtualization have led the pathway for these workloads to be deployed in virtualized clusters. However, HPC cluster administrators/providers still…
View article: Distilling the Essence of Raw Video to Reduce Memory Usage and Energy at Edge Devices
Distilling the Essence of Raw Video to Reduce Memory Usage and Energy at Edge Devices Open
Video broadcast and streaming are among the most widely used applications for edge devices. Roughly 82% of the mobile internet traffic is made up of video data. This is likely to worsen with the advent of 5G that will open up new opportuni…
View article: Quantifying Data Locality in Dynamic Parallelism in GPUs
Quantifying Data Locality in Dynamic Parallelism in GPUs Open
Dynamic parallelism (DP) is a new feature of emerging GPUs that allows new kernels to be generated and scheduled from the device-side (GPU) without the host-side (CPU) intervention. To efficiently support DP, one of the major challenges is…
View article: Opportunistic computing in GPU architectures
Opportunistic computing in GPU architectures Open
Data transfer overhead between computing cores and memory hierarchy has been a persistent issue for von Neumann architectures and the problem has only become more challenging with the emergence of manycore systems. A conceptually powerful …
View article: SOML Read
SOML Read Open
NAND-based solid-state disks (SSDs) are known for their superior random read/write performance due to the high degrees of multi-chip parallelism they exhibit. Currently, as the chip density increases dramatically, fewer 3D NAND chips are n…
View article: Quantifying Data Locality in Dynamic Parallelism in GPUs
Quantifying Data Locality in Dynamic Parallelism in GPUs Open
GPUs are becoming prevalent in various domains of computing and are widely used for streaming (regular) applications. However, they are highly inefficient when executing irregular applications with unstructured inputs due to load imbalance…
View article: The Curious Case of Container Orchestration and Scheduling in GPU-based Datacenters
The Curious Case of Container Orchestration and Scheduling in GPU-based Datacenters Open
Modern data centers are increasingly being provisioned with compute accelerators such as GPUs, FPGAs and ASIC's to catch up with the workload performance demands and reduce the total cost of ownership (TCO). By 2021, traffic within hypersc…
View article: FLOSS
FLOSS Open
Today's mobile platforms have grown in sophistication to run a wide variety of frame-based applications. To deliver better QoS and energy efficiency, these applications utilize multi-flow execution, which exploits hardware-level parallelis…
View article: Parallelizing garbage collection with I/O to improve flash resource utilization
Parallelizing garbage collection with I/O to improve flash resource utilization Open
Garbage Collection (GC) has been a critical optimization target for improving the performance of flash-based Solid State Drives (SSDs); the long-lasting GC process occupies the flash resources, thereby blocking normal I/O requests and incr…
View article: Holistic Management of the GPGPU Memory Hierarchy to Manage Warp-level Latency Tolerance
Holistic Management of the GPGPU Memory Hierarchy to Manage Warp-level Latency Tolerance Open
In a modern GPU architecture, all threads within a warp execute the same instruction in lockstep. For a memory instruction, this can lead to memory divergence: the memory requests for some threads are serviced early, while the remaining re…