OneAdapt Article Swipe

View

Related Concepts

Computer science Bandwidth (computing) High fidelity Inference Overhead (engineering) Artificial neural network Computer engineering Frame rate Artificial intelligence Real-time computing Computer network Electrical engineering Engineering Operating system

Kuntai Du , Yuhan Liu , Yitian Hao , Qizheng Zhang , Haodong Wang , Yuyang Huang , Ganesh Ananthanarayanan , Junchen Jiang ·

YOU? · · 2023 · Open Access · · DOI: https://doi.org/10.1145/3620678.3624653 · OA: W4387389733

Deep learning inference on streaming media data, such as object detection in\nvideo or LiDAR feeds and text extraction from audio waves, is now ubiquitous.\nTo achieve high inference accuracy, these applications typically require\nsignificant network bandwidth to gather high-fidelity data and extensive GPU\nresources to run deep neural networks (DNNs). While the high demand for network\nbandwidth and GPU resources could be substantially reduced by optimally\nadapting the configuration knobs, such as video resolution and frame rate,\ncurrent adaptation techniques fail to meet three requirements simultaneously:\nadapt configurations (i) with minimum extra GPU or bandwidth overhead; (ii) to\nreach near-optimal decisions based on how the data affects the final DNN's\naccuracy, and (iii) do so for a range of configuration knobs. This paper\npresents OneAdapt, which meets these requirements by leveraging a\ngradient-ascent strategy to adapt configuration knobs. The key idea is to\nembrace DNNs' differentiability to quickly estimate the accuracy's gradient to\neach configuration knob, called AccGrad. Specifically, OneAdapt estimates\nAccGrad by multiplying two gradients: InputGrad (i.e. how each configuration\nknob affects the input to the DNN) and DNNGrad (i.e. how the DNN input affects\nthe DNN inference output). We evaluate OneAdapt across five types of\nconfigurations, four analytic tasks, and five types of input data. Compared to\nstate-of-the-art adaptation schemes, OneAdapt cuts bandwidth usage and GPU\nusage by 15-59% while maintaining comparable accuracy or improves accuracy by\n1-5% while using equal or fewer resources.\n