2019-10-08
Memory- and Communication-Aware Model Compression for Distributed Deep Learning Inference on IoT
2019-10-08 • Kartikeya Bhardwaj, Ching‐Yi Lin, Anderson L. Sartor, Radu Mărculescu
Model compression has emerged as an important area of research for deploying deep learning models on Internet-of-Things (IoT). However, for extremely memory-constrained scenarios, even the compressed models cannot fit within the memory of a single device and, as a result, must be distributed across multiple devices. This leads to a distributed inference paradigm in which memory and communication costs represent a major bottleneck. Yet, existing model compression techniques are not communication-aware. Therefore, w…