Explanipedia

Pushing the Performance Envelope of DNN-based Recommendation Systems Inference on GPUs Open

Rishabh Jain, Vivek M. Bhasi, Adwait Jog, Anand Sivasubramaniam, Mahmut Kandemir , et al. · 2024

Computer science

Personalized recommendation is a ubiquitous application on the internet, with many industries and hyperscalers extensively leveraging Deep Learning Recommendation Models (DLRMs) for their personalization needs (like ad serving or movie sug…

Salient Store: Enabling Smart Storage for Continuous Learning Edge Servers Open

Cyan Subhra Mishra, Deeksha Chaudhary, Jack Sampson, Mahmut Taylan Knademir, Chita R. Das · 2024

Computer science

As continuous learning based video analytics continue to evolve, the role of efficient edge servers in efficiently managing vast and dynamic datasets is becoming increasingly crucial. Unlike their compute architecture, storage and archival…

Synergistic and Efficient Edge-Host Communication for Energy Harvesting Wireless Sensor Networks Open

Cyan Subhra Mishra, Jack Sampson, Mahmut Taylan Kandmeir, Vijaykrishnan Narayanan, Chita R. Das · 2024

Computer science Physics Biology

There is an increasing demand for intelligent processing on ultra-low-power internet of things (IoT) device. Recent works have shown substantial efficiency boosts by executing inferences directly on the IoT device (node) rather than transm…

Revisiting DNN Training for Intermittently-Powered Energy-Harvesting Micro-Computers Open

Cyan Subhra Mishra, Deeksha Chaudhary, Jack Sampson, Mahmut Taylan Knademir, Chita R. Das · 2024

Computer science Geography Mathematics

The deployment of Deep Neural Networks in energy-constrained environments, such as Energy Harvesting Wireless Sensor Networks, presents unique challenges, primarily due to the intermittent nature of power availability. To address these cha…

GPU Cluster Scheduling for Network-Sensitive Deep Learning Open

Aakash Sharma, Vivek M. Bhasi, Sonali Singh, George Kesidis, Mahmut Kandemir , et al. · 2024

Computer science Engineering

We propose a novel GPU-cluster scheduler for distributed DL (DDL) workloads that enables proximity based consolidation of GPU resources based on the DDL jobs' sensitivities to the anticipated communication-network delays. Our scheduler con…

Analysis of Distributed Deep Learning in the Cloud Open

Aakash Sharma, Vivek M. Bhasi, Sonali Singh, Rishabh Jain, Jashwant Raj Gunasekaran , et al. · 2022

Computer science Economics

We aim to resolve this problem by introducing a comprehensive distributed deep learning (DDL) profiler, which can determine the various execution "stalls" that DDL suffers from while running on a public cloud. We have implemented the profi…

Structured in Space, Randomized in Time: Leveraging Dropout in RNNs for Efficient Training Open

Anup Sarma, Sonali Singh, Huaipan Jiang, Rui Zhang, Mahmut Kandemir , et al. · 2021

Computer science Mathematics

Recurrent Neural Networks (RNNs), more specifically their Long Short-Term Memory (LSTM) variants, have been widely used as a deep learning tool for tackling sequence-based learning tasks in text and speech. Training of such LSTM applicatio…

Kraken Open

Vivek M. Bhasi, Jashwant Raj Gunasekaran, Prashanth Thinakaran, Cyan Subhra Mishra, Mahmut Kandemir , et al. · 2021

Computer science

The growing popularity of microservices has led to the proliferation of online cloud service-based applications, which are typically modelled as Directed Acyclic Graphs (DAGs) comprising of tens to hundreds of microservices. The vast major…

Exploiting Activation based Gradient Output Sparsity to Accelerate Backpropagation in CNNs Open

Anup Sarma, Sonali Singh, Huaipan Jiang, Ashutosh Pattnaik, Asit Mishra , et al. · 2021

Computer science Mathematics Geology

Machine/deep-learning (ML/DL) based techniques are emerging as a driving force behind many cutting-edge technologies, achieving high accuracy on computer vision workloads such as image classification and object detection. However, training…

Structured in Space, Randomized in Time: Leveraging Dropout in RNNs for Efficient Training Open

Anup Sarma, Sonali Singh, Huaipan Jiang, Rui Zhang, Mahmut Kandemir , et al. · 2021

Computer science Biology Mathematics

Recurrent Neural Networks (RNNs), more specifically their Long Short-Term Memory (LSTM) variants, have been widely used as a deep learning tool for tackling sequence-based learning tasks in text and speech. Training of such LSTM applicatio…

Cocktail: Leveraging Ensemble Learning for Optimized Model Serving in Public Cloud Open

Jashwant Raj Gunasekaran, Cyan Subhra Mishra, Prashanth Thinakaran, Mahmut Kandemir, Chita R. Das · 2021

Computer science Mathematics

With a growing demand for adopting ML models for a varietyof application services, it is vital that the frameworks servingthese models are capable of delivering highly accurate predic-tions with minimal latency along with reduced deploymen…

CASH: A Credit Aware Scheduling for Public Cloud Platforms Open

Aakash Sharma, Saravanan Dhakshinamurthy, George Kesidis, Chita R. Das · 2021

Computer science Economics Materials science

The public cloud offers a myriad of services which allows its tenants to process large scale big data in a flexible, easy and cost effective manner. Tenants generally use large scale data processing frameworks such as MapReduce, Tez, Spark…

Fifer: Tackling Underutilization in the Serverless Era Open

Jashwant Raj Gunasekaran, Prashanth Thinakaran, Nachiappan Chidambaram, Mahmut Kandemir, Chita R. Das · 2020

Computer science Economics Engineering

Datacenters are witnessing a rapid surge in the adoption of serverless functions for microservices-based applications. A vast majority of these microservices typically span less than a second, have strict SLO requirements, and are chained …

Towards Designing a Self-Managed Machine Learning Inference Serving System inPublic Cloud Open

Jashwant Raj Gunasekaran, Prashanth Thinakaran, Cyan Subhra Mishra, Mahmut Kandemir, Chita R. Das · 2020

Computer science

We are witnessing an increasing trend towardsusing Machine Learning (ML) based prediction systems, span-ning across different application domains, including productrecommendation systems, personal assistant devices, facialrecognition, etc.…

Multiverse: Dynamic VM Provisioning for Virtualized High Performance\n Computing Clusters Open

Jashwant Raj Gunasekaran, Michael M. Cui, Prashanth Thinakaran, Josh Simons, Mahmut Kandemir , et al. · 2020

Computer science Engineering

Traditionally, HPC workloads have been deployed in bare-metal clusters; but\nthe advances in virtualization have led the pathway for these workloads to be\ndeployed in virtualized clusters. However, HPC cluster administrators/providers\nst…

Multiverse: Dynamic VM Provisioning for Virtualized High Performance Computing Clusters Open

Jashwant Raj Gunasekaran, Michael M. Cui, Prashanth Thinakaran, Josh Simons, Mahmut Kandemir , et al. · 2020

Computer science Engineering

Traditionally, HPC workloads have been deployed in bare-metal clusters; but the advances in virtualization have led the pathway for these workloads to be deployed in virtualized clusters. However, HPC cluster administrators/providers still…

Distilling the Essence of Raw Video to Reduce Memory Usage and Energy at Edge Devices Open

Haibo Zhang, Shulin Zhao, Ashutosh Pattnaik, Mahmut Kandemir, Anand Sivasubramaniam , et al. · 2019

Computer science Biology Engineering

Video broadcast and streaming are among the most widely used applications for edge devices. Roughly 82% of the mobile internet traffic is made up of video data. This is likely to worsen with the advent of 5G that will open up new opportuni…

Quantifying Data Locality in Dynamic Parallelism in GPUs Open

Xulong Tang, Ashutosh Pattnaik, Onur Kayıran, Adwait Jog, Mahmut Kandemir , et al. · 2019

Computer science Philosophy Biology

Dynamic parallelism (DP) is a new feature of emerging GPUs that allows new kernels to be generated and scheduled from the device-side (GPU) without the host-side (CPU) intervention. To efficiently support DP, one of the major challenges is…

Opportunistic computing in GPU architectures Open

Ashutosh Pattnaik, Xulong Tang, Onur Kayıran, Adwait Jog, Asit Mishra , et al. · 2019

Computer science Mathematics

Data transfer overhead between computing cores and memory hierarchy has been a persistent issue for von Neumann architectures and the problem has only become more challenging with the emergence of manycore systems. A conceptually powerful …

SOML Read Open

Chunyi Liu, Jagadish Kotra, Myoungsoo Jung, Mahmut Kandemir, Chita R. Das · 2019

Computer science

NAND-based solid-state disks (SSDs) are known for their superior random read/write performance due to the high degrees of multi-chip parallelism they exhibit. Currently, as the chip density increases dramatically, fewer 3D NAND chips are n…

Quantifying Data Locality in Dynamic Parallelism in GPUs Open

Xulong Tang, Ashutosh Pattnaik, Onur Kayıran, Adwait Jog, Mahmut Kandemir , et al. · 2018

Computer science Mathematics Economics

GPUs are becoming prevalent in various domains of computing and are widely used for streaming (regular) applications. However, they are highly inefficient when executing irregular applications with unstructured inputs due to load imbalance…

The Curious Case of Container Orchestration and Scheduling in GPU-based Datacenters Open

Prashanth Thinakaran, J.Arockia Stephen Raj, Bikash Sharma, Mahmut Kandemir, Chita R. Das · 2018

Computer science Engineering Economics

Modern data centers are increasingly being provisioned with compute accelerators such as GPUs, FPGAs and ASIC's to catch up with the workload performance demands and reduce the total cost of ownership (TCO). By 2021, traffic within hypersc…

FLOSS Open

Haibo Zhang, Prasanna Venkatesh Rengasamy, Nachiappan Chidambaram Nachiappan, Shulin Zhao, Anand Sivasubramaniam , et al. · 2018

Computer science Engineering Economics

Today's mobile platforms have grown in sophistication to run a wide variety of frame-based applications. To deliver better QoS and energy efficiency, these applications utilize multi-flow execution, which exploits hardware-level parallelis…

Parallelizing garbage collection with I/O to improve flash resource utilization Open

Wonil Choi, Myoungsoo Jung, Mahmut Kandemir, Chita R. Das · 2018

Computer science Mathematics Art

Garbage Collection (GC) has been a critical optimization target for improving the performance of flash-based Solid State Drives (SSDs); the long-lasting GC process occupies the flash resources, thereby blocking normal I/O requests and incr…

Holistic Management of the GPGPU Memory Hierarchy to Manage Warp-level Latency Tolerance Open

Rachata Ausavarungnirun, Saugata Ghose, Onur Kayıran, Gabriel H. Loh, Chita R. Das , et al. · 2018

Computer science Political science

In a modern GPU architecture, all threads within a warp execute the same instruction in lockstep. For a memory instruction, this can lead to memory divergence: the memory requests for some threads are serviced early, while the remaining re…

Chita R. Das YOU? Author Swipe