Explanipedia

Design Principles for Lifelong Learning AI Accelerators Open

Dhireesha Kudithipudi, Anurag Daram, Abdullah M. Zyarah, Fatima Tuz Zohora, James B. Aimone , et al. · 2023

Lifelong learning - an agent's ability to learn throughout its lifetime - is a hallmark of biological learning systems and a central challenge for artificial intelligence (AI). The development of lifelong learning algorithms could lead to …

UDC: Unified DNAS for Compressible TinyML Models Open

I. A. Fedorov, Ramon Matas, Hokchhay Tann, Chuteng Zhou, Matthew Mattina , et al. · 2022

Deploying TinyML models on low-cost IoT hardware is very challenging, due to limited device memory capacity. Neural processing unit (NPU) hardware address the memory challenge by using model compression to exploit weight quantization and s…

Federated Learning Based on Dynamic Regularization Open

Durmus Alp Emre Acar, Yue Zhao, Ramon Matas, Matthew Mattina, Paul N. Whatmough , et al. · 2021

We propose a novel federated learning method for distributively training neural network models, where the server orchestrates cooperation between a subset of randomly chosen devices in each round. We view Federated Learning problem primari…

Towards Efficient Point Cloud Graph Neural Networks Through Architectural Simplification Open

Shyam A. Tailor, René de Jong, Tiago Azevedo, Matthew Mattina, Partha Maji · 2021

In recent years graph neural network (GNN)-based approaches have become a popular strategy for processing point cloud data, regularly achieving state-of-the-art performance on a variety of tasks. To date, the research community has primari…

Towards Efficient Point Cloud Graph Neural Networks Through\n Architectural Simplification Open

Shyam A. Tailor, René de Jong, Tiago Azevedo, Matthew Mattina, Partha Maji · 2021

In recent years graph neural network (GNN)-based approaches have become a\npopular strategy for processing point cloud data, regularly achieving\nstate-of-the-art performance on a variety of tasks. To date, the research\ncommunity has prim…

S2TA: Exploiting Structured Sparsity for Energy-Efficient Mobile CNN Acceleration Open

Zhigang Liu, Paul N. Whatmough, Yuhao Zhu, Matthew Mattina · 2021

Exploiting sparsity is a key technique in accelerating quantized convolutional neural network (CNN) inference on mobile devices. Prior sparse CNN accelerators largely exploit un-structured sparsity and achieve significant speedups. Due to …

On the Effects of Quantisation on Model Uncertainty in Bayesian Neural Networks Open

Martin Ferianc, Partha Maji, Matthew Mattina, Miguel R. D. Rodrigues · 2021

Bayesian neural networks (BNNs) are making significant progress in many research areas where decision-making needs to be accompanied by uncertainty estimation. Being able to quantify uncertainty while making decisions is essential for unde…

Doping: A technique for efficient compression of LSTM models using sparse structured additive matrices Open

Urmish Thakker, Paul N. Whatmough, Zhigang Liu, Matthew Mattina, Jesse Beu · 2021

Structured matrices, such as those derived from Kronecker products (KP), are effective at compressing neural networks, but can lead to unacceptable accuracy loss when applied to large models. In this paper, we propose the notion of doping …

Information contraction in noisy binary neural networks and its implications Open

Chuteng Zhou, Quntao Zhuang, Matthew Mattina, Paul N. Whatmough · 2021

Neural networks have gained importance as the machine learning models that achieve state-of-the-art performance on large-scale image classification, object detection and natural language processing tasks. In this paper, we consider noisy b…

TinyLSTMs: Efficient Neural Speech Enhancement for Hearing Aids Open

Igor Fedorov, Marko Stamenovic, Carl Jensen, Li-Chia Yang, Ari Mandell , et al. · 2020

Modern speech enhancement algorithms achieve remarkable noise suppression by\nmeans of large recurrent neural networks (RNNs). However, large RNNs limit\npractical deployment in hearing aid hardware (HW) form-factors, which are\nbattery po…

MicroNets: Neural Network Architectures for Deploying TinyML Applications on Commodity Microcontrollers Open

Colby Banbury, Chuteng Zhou, Igor Fedorov, Ramon Matas Navarro, Urmish Thakker , et al. · 2020

Executing machine learning workloads locally on resource constrained microcontrollers (MCUs) promises to drastically expand the application space of IoT. However, so-called TinyML presents severe technical challenges, as deep neural networ…

Sparse Systolic Tensor Array for Efficient CNN Hardware Acceleration Open

Zhi Gang Liu, Paul N. Whatmough, Matthew Mattina · 2020

Convolutional neural network (CNN) inference on mobile devices demands efficient hardware acceleration of low-precision (INT8) general matrix multiplication (GEMM). Exploiting data sparsity is a common approach to further accelerate GEMM f…

High Throughput Matrix-Matrix Multiplication between Asymmetric Bit-Width Operands Open

Dibakar Gope, Jesse Beu, Matthew Mattina · 2020

Matrix multiplications between asymmetric bit-width operands, especially between 8- and 4-bit operands are likely to become a fundamental kernel of many important workloads including neural networks and machine learning. While existing SIM…

Efficient Residue Number System Based Winograd Convolution Open

Zhigang Liu, Matthew Mattina · 2020

Prior research has shown that Winograd algorithm can reduce the computational complexity of convolutional neural networks (CNN) with weights and activations represented in floating point. However it is difficult to apply the scheme to the …

Ternary MobileNets via Per-Layer Hybrid Filter Banks Open

Dibakar Gope, Jesse Beu, Urmish Thakker, Matthew Mattina · 2020

MobileNets family of computer vision neural networks have fueled tremendous progress in the design and organization of resource-efficient architectures in recent years. New applications with stringent real-time requirements on highly const…

Systolic Tensor Array: An Efficient Structured-Sparse GEMM Accelerator for Mobile CNN Inference Open

Zhigang Liu, Paul N. Whatmough, Matthew Mattina · 2020

Convolutional neural network (CNN) inference on mobile devices demands efficient hardware acceleration of low-precision (INT8) general matrix multiplication (GEMM). The systolic array (SA) is a pipelined 2D array of processing elements (PE…

Searching for Winograd-aware Quantized Networks Open

Javier Fernández-Marqués, Paul N. Whatmough, Andrew Mundy, Matthew Mattina · 2020

Lightweight architectural designs of Convolutional Neural Networks (CNNs) together with quantization have paved the way for the deployment of demanding computer vision applications on mobile devices. Parallel to this, alternative formulati…

Compressing Language Models using Doped Kronecker Products Open

Urmish Thakker, Paul N. Whatmough, Matthew Mattina, Jesse Beu · 2020

Kronecker Products (KP) have been used to compress IoT RNN Applications by 15-38x compression factors, achieving better results than traditional compression methods. However when KP is applied to large Natural Language Processing tasks, it…

Noisy Machines: Understanding Noisy Neural Networks and Enhancing Robustness to Analog Hardware Errors Using Distillation Open

Chuteng Zhou, Prad Kadambi, Matthew Mattina, Paul N. Whatmough · 2020

The success of deep learning has brought forth a wave of interest in computer hardware design to better meet the high demands of neural network inference. In particular, analog computing hardware has been heavily motivated specifically for…

Rank and run-time aware compression of NLP Applications Open

Urmish Thakker, Jesse Beu, Dibakar Gope, Ganesh Dasika, Matthew Mattina · 2020

Sequence model based NLP applications can be large. Yet, many applications that benefit from them run on small devices with very limited compute and storage capabilities, while still having run-time constraints. As a result, there is a nee…

Pushing the limits of RNN Compression Open

Urmish Thakker, Igor Fedorov, Jesse Beu, Dibakar Gope, Chu Zhou , et al. · 2019

Recurrent Neural Networks (RNN) can be difficult to deploy on resource constrained devices due to their size. As a result, there is a need for compression techniques that can significantly compress RNNs without negatively impacting task ac…

ISP4ML: Understanding the Role of Image Signal Processing in Efficient Deep Learning Vision Systems Open

Patrick Hansen, Alexey Vilkin, Yury Khrustalev, J. Imber, David Hanwell , et al. · 2019

Convolutional neural networks (CNNs) are now predominant components in a variety of computer vision (CV) systems. These systems typically include an image signal processor (ISP), even though the ISP is traditionally designed to produce ima…

Learning Low-precision Neural Networks without Straight-Through Estimator (STE) Open

Zhi-Gang Liu, Matthew Mattina · 2019

The Straight-Through Estimator (STE) is widely used for back-propagating gradients through the quantization function, but the STE technique lacks a complete theoretical understanding. We propose an alternative methodology called alpha-blen…

Compressing RNNs for IoT devices by 15-38x using Kronecker Products Open

Urmish Thakker, Jesse Beu, Dibakar Gope, Chu Zhou, Igor Fedorov , et al. · 2019

Recurrent Neural Networks (RNN) can be difficult to deploy on resource constrained devices due to their size.As a result, there is a need for compression techniques that can significantly compress RNNs without negatively impacting task acc…

Efficient Winograd or Cook-Toom Convolution Kernel Implementation on Widely Used Mobile CPUs Open

Partha Maji, Andrew Mundy, Ganesh Dasika, Jesse Beu, Matthew Mattina , et al. · 2019

The Winograd or Cook-Toom class of algorithms help to reduce the overall compute complexity of many modern deep convolutional neural networks (CNNs). Although there has been a lot of research done on model and algorithmic optimization of C…

SpArSe: Sparse Architecture Search for CNNs on Resource-Constrained Microcontrollers Open

Igor Fedorov, Ryan P. Adams, Matthew Mattina, Paul N. Whatmough · 2019

The vast majority of processors in the world are actually microcontroller units (MCUs), which find widespread use performing simple control tasks in applications ranging from automobiles to medical devices and office equipment. The Interne…

Ternary Hybrid Neural-Tree Networks for Highly Constrained IoT Applications Open

Dibakar Gope, Ganesh Dasika, Matthew Mattina · 2019

Machine learning-based applications are increasingly prevalent in IoT devices. The power and storage constraints of these devices make it particularly challenging to run modern neural networks, limiting the number of new applications that …

Measuring scheduling efficiency of RNNs for NLP applications Open

Urmish Thakker, Ganesh Dasika, Jesse Beu, Matthew Mattina · 2019

Recurrent neural networks (RNNs) have shown state of the art results for speech recognition, natural language processing, image captioning and video summarizing applications. Many of these applications run on low-power platforms, so their …

Learning low-precision neural networks without Straight-Through Estimator(STE) Open

Zhigang Liu, Matthew Mattina · 2019

The Straight-Through Estimator (STE) is widely used for back-propagating gradients through the quantization function, but the STE technique lacks a complete theoretical understanding. We propose an alternative methodology called alpha-blen…

Ternary Hybrid Neural-Tree Networks for Highly Constrained IoT Applications Open

Dibakar Gope, Ganesh Dasika, Matthew Mattina · 2019

Machine learning-based applications are increasingly prevalent in IoT devices. The power and storage constraints of these devices make it particularly challenging to run modern neural networks, limiting the number of new applications that …

Matthew Mattina YOU? Author Swipe