Parth Malani
YOU?
Author Swipe
View article: Dynamic Idle Resource Leasing To Safely Oversubscribe Capacity At Meta
Dynamic Idle Resource Leasing To Safely Oversubscribe Capacity At Meta Open
View article: Revisiting Reliability in Large-Scale Machine Learning Research Clusters
Revisiting Reliability in Large-Scale Machine Learning Research Clusters Open
Reliability is a fundamental challenge in operating large-scale machine learning (ML) infrastructures, particularly as the scale of ML models and training clusters continues to grow. Despite decades of research on infrastructure failures, …
View article: Expanding Datacenter Capacity with DVFS Boosting: A safe and scalable deployment experience
Expanding Datacenter Capacity with DVFS Boosting: A safe and scalable deployment experience Open
COVID-19 pandemic created unexpected demand for our physical infrastructure. We increased our computing supply by growing our infrastructure footprint as well as expanded existing capacity by using various techniques among those DVFS boost…
View article: Tutorial: MARS: A framework for runtime monitoring, modeling, and management of realtime systems
Tutorial: MARS: A framework for runtime monitoring, modeling, and management of realtime systems Open
From datacenters to embedded devices, modern realtime workloads are demanding exceptional computational capacity from state-of-the-art systems, while satisfying energy constraints, real-time deadlines, mixed criticality workloads, and sati…
View article: Interference and Need Aware Workload Colocation in Hyperscale Datacenters
Interference and Need Aware Workload Colocation in Hyperscale Datacenters Open
Datacenters suffer from resource utilization inefficiencies due to the conflicting goals of service owners and platform providers. Service owners intending to maintain Service Level Objectives (SLO) for themselves typically request a conse…
View article: Transitive Power Modeling for Improving Resource Efficiency in a Hyperscale Datacenter
Transitive Power Modeling for Improving Resource Efficiency in a Hyperscale Datacenter Open
Maintaining efficient utilization of allocated compute resources and controlling their capital and operating expenditure is important for running a hyperscale datacenter infrastructure. Power is one of the most constrained and difficult to…
View article: Deep Learning Inference in Facebook Data Centers: Characterization, Performance Optimizations and Hardware Implications
Deep Learning Inference in Facebook Data Centers: Characterization, Performance Optimizations and Hardware Implications Open
The application of deep learning techniques resulted in remarkable improvement of machine learning models. In this paper provides detailed characterizations of deep learning models used in many Facebook social network services. We present …
View article: Deep Learning Inference in Facebook Data Centers: Characterization,\n Performance Optimizations and Hardware Implications
Deep Learning Inference in Facebook Data Centers: Characterization,\n Performance Optimizations and Hardware Implications Open
The application of deep learning techniques resulted in remarkable\nimprovement of machine learning models. In this paper provides detailed\ncharacterizations of deep learning models used in many Facebook social network\nservices. We prese…