Matthew Mattina
YOU?
Author Swipe
View article: Design Principles for Lifelong Learning AI Accelerators
Design Principles for Lifelong Learning AI Accelerators Open
Lifelong learning - an agent's ability to learn throughout its lifetime - is a hallmark of biological learning systems and a central challenge for artificial intelligence (AI). The development of lifelong learning algorithms could lead to …
View article: UDC: Unified DNAS for Compressible TinyML Models
UDC: Unified DNAS for Compressible TinyML Models Open
Deploying TinyML models on low-cost IoT hardware is very challenging, due to limited device memory capacity. Neural processing unit (NPU) hardware address the memory challenge by using model compression to exploit weight quantization and s…
View article: Federated Learning Based on Dynamic Regularization
Federated Learning Based on Dynamic Regularization Open
We propose a novel federated learning method for distributively training neural network models, where the server orchestrates cooperation between a subset of randomly chosen devices in each round. We view Federated Learning problem primari…
View article: Towards Efficient Point Cloud Graph Neural Networks Through Architectural Simplification
Towards Efficient Point Cloud Graph Neural Networks Through Architectural Simplification Open
In recent years graph neural network (GNN)-based approaches have become a popular strategy for processing point cloud data, regularly achieving state-of-the-art performance on a variety of tasks. To date, the research community has primari…
View article: Towards Efficient Point Cloud Graph Neural Networks Through\n Architectural Simplification
Towards Efficient Point Cloud Graph Neural Networks Through\n Architectural Simplification Open
In recent years graph neural network (GNN)-based approaches have become a\npopular strategy for processing point cloud data, regularly achieving\nstate-of-the-art performance on a variety of tasks. To date, the research\ncommunity has prim…
View article: S2TA: Exploiting Structured Sparsity for Energy-Efficient Mobile CNN Acceleration
S2TA: Exploiting Structured Sparsity for Energy-Efficient Mobile CNN Acceleration Open
Exploiting sparsity is a key technique in accelerating quantized convolutional neural network (CNN) inference on mobile devices. Prior sparse CNN accelerators largely exploit un-structured sparsity and achieve significant speedups. Due to …
View article: On the Effects of Quantisation on Model Uncertainty in Bayesian Neural Networks
On the Effects of Quantisation on Model Uncertainty in Bayesian Neural Networks Open
Bayesian neural networks (BNNs) are making significant progress in many research areas where decision-making needs to be accompanied by uncertainty estimation. Being able to quantify uncertainty while making decisions is essential for unde…
View article: Doping: A technique for efficient compression of LSTM models using sparse structured additive matrices
Doping: A technique for efficient compression of LSTM models using sparse structured additive matrices Open
Structured matrices, such as those derived from Kronecker products (KP), are effective at compressing neural networks, but can lead to unacceptable accuracy loss when applied to large models. In this paper, we propose the notion of doping …
View article: Information contraction in noisy binary neural networks and its implications
Information contraction in noisy binary neural networks and its implications Open
Neural networks have gained importance as the machine learning models that achieve state-of-the-art performance on large-scale image classification, object detection and natural language processing tasks. In this paper, we consider noisy b…
View article: TinyLSTMs: Efficient Neural Speech Enhancement for Hearing Aids
TinyLSTMs: Efficient Neural Speech Enhancement for Hearing Aids Open
Modern speech enhancement algorithms achieve remarkable noise suppression by\nmeans of large recurrent neural networks (RNNs). However, large RNNs limit\npractical deployment in hearing aid hardware (HW) form-factors, which are\nbattery po…
View article: MicroNets: Neural Network Architectures for Deploying TinyML Applications on Commodity Microcontrollers
MicroNets: Neural Network Architectures for Deploying TinyML Applications on Commodity Microcontrollers Open
Executing machine learning workloads locally on resource constrained microcontrollers (MCUs) promises to drastically expand the application space of IoT. However, so-called TinyML presents severe technical challenges, as deep neural networ…
View article: Sparse Systolic Tensor Array for Efficient CNN Hardware Acceleration
Sparse Systolic Tensor Array for Efficient CNN Hardware Acceleration Open
Convolutional neural network (CNN) inference on mobile devices demands efficient hardware acceleration of low-precision (INT8) general matrix multiplication (GEMM). Exploiting data sparsity is a common approach to further accelerate GEMM f…
View article: High Throughput Matrix-Matrix Multiplication between Asymmetric Bit-Width Operands
High Throughput Matrix-Matrix Multiplication between Asymmetric Bit-Width Operands Open
Matrix multiplications between asymmetric bit-width operands, especially between 8- and 4-bit operands are likely to become a fundamental kernel of many important workloads including neural networks and machine learning. While existing SIM…
View article: Efficient Residue Number System Based Winograd Convolution
Efficient Residue Number System Based Winograd Convolution Open
Prior research has shown that Winograd algorithm can reduce the computational complexity of convolutional neural networks (CNN) with weights and activations represented in floating point. However it is difficult to apply the scheme to the …
View article: Ternary MobileNets via Per-Layer Hybrid Filter Banks
Ternary MobileNets via Per-Layer Hybrid Filter Banks Open
MobileNets family of computer vision neural networks have fueled tremendous progress in the design and organization of resource-efficient architectures in recent years. New applications with stringent real-time requirements on highly const…
View article: Systolic Tensor Array: An Efficient Structured-Sparse GEMM Accelerator for Mobile CNN Inference
Systolic Tensor Array: An Efficient Structured-Sparse GEMM Accelerator for Mobile CNN Inference Open
Convolutional neural network (CNN) inference on mobile devices demands efficient hardware acceleration of low-precision (INT8) general matrix multiplication (GEMM). The systolic array (SA) is a pipelined 2D array of processing elements (PE…
View article: Searching for Winograd-aware Quantized Networks
Searching for Winograd-aware Quantized Networks Open
Lightweight architectural designs of Convolutional Neural Networks (CNNs) together with quantization have paved the way for the deployment of demanding computer vision applications on mobile devices. Parallel to this, alternative formulati…
View article: Compressing Language Models using Doped Kronecker Products
Compressing Language Models using Doped Kronecker Products Open
Kronecker Products (KP) have been used to compress IoT RNN Applications by 15-38x compression factors, achieving better results than traditional compression methods. However when KP is applied to large Natural Language Processing tasks, it…
View article: Noisy Machines: Understanding Noisy Neural Networks and Enhancing Robustness to Analog Hardware Errors Using Distillation
Noisy Machines: Understanding Noisy Neural Networks and Enhancing Robustness to Analog Hardware Errors Using Distillation Open
The success of deep learning has brought forth a wave of interest in computer hardware design to better meet the high demands of neural network inference. In particular, analog computing hardware has been heavily motivated specifically for…
View article: Rank and run-time aware compression of NLP Applications
Rank and run-time aware compression of NLP Applications Open
Sequence model based NLP applications can be large. Yet, many applications that benefit from them run on small devices with very limited compute and storage capabilities, while still having run-time constraints. As a result, there is a nee…
View article: Pushing the limits of RNN Compression
Pushing the limits of RNN Compression Open
Recurrent Neural Networks (RNN) can be difficult to deploy on resource constrained devices due to their size. As a result, there is a need for compression techniques that can significantly compress RNNs without negatively impacting task ac…
View article: ISP4ML: Understanding the Role of Image Signal Processing in Efficient Deep Learning Vision Systems
ISP4ML: Understanding the Role of Image Signal Processing in Efficient Deep Learning Vision Systems Open
Convolutional neural networks (CNNs) are now predominant components in a variety of computer vision (CV) systems. These systems typically include an image signal processor (ISP), even though the ISP is traditionally designed to produce ima…
View article: Learning Low-precision Neural Networks without Straight-Through Estimator (STE)
Learning Low-precision Neural Networks without Straight-Through Estimator (STE) Open
The Straight-Through Estimator (STE) is widely used for back-propagating gradients through the quantization function, but the STE technique lacks a complete theoretical understanding. We propose an alternative methodology called alpha-blen…
View article: Compressing RNNs for IoT devices by 15-38x using Kronecker Products
Compressing RNNs for IoT devices by 15-38x using Kronecker Products Open
Recurrent Neural Networks (RNN) can be difficult to deploy on resource constrained devices due to their size.As a result, there is a need for compression techniques that can significantly compress RNNs without negatively impacting task acc…
View article: Efficient Winograd or Cook-Toom Convolution Kernel Implementation on Widely Used Mobile CPUs
Efficient Winograd or Cook-Toom Convolution Kernel Implementation on Widely Used Mobile CPUs Open
The Winograd or Cook-Toom class of algorithms help to reduce the overall compute complexity of many modern deep convolutional neural networks (CNNs). Although there has been a lot of research done on model and algorithmic optimization of C…
View article: SpArSe: Sparse Architecture Search for CNNs on Resource-Constrained Microcontrollers
SpArSe: Sparse Architecture Search for CNNs on Resource-Constrained Microcontrollers Open
The vast majority of processors in the world are actually microcontroller units (MCUs), which find widespread use performing simple control tasks in applications ranging from automobiles to medical devices and office equipment. The Interne…
View article: Ternary Hybrid Neural-Tree Networks for Highly Constrained IoT Applications
Ternary Hybrid Neural-Tree Networks for Highly Constrained IoT Applications Open
Machine learning-based applications are increasingly prevalent in IoT devices. The power and storage constraints of these devices make it particularly challenging to run modern neural networks, limiting the number of new applications that …
View article: Measuring scheduling efficiency of RNNs for NLP applications
Measuring scheduling efficiency of RNNs for NLP applications Open
Recurrent neural networks (RNNs) have shown state of the art results for speech recognition, natural language processing, image captioning and video summarizing applications. Many of these applications run on low-power platforms, so their …
View article: Learning low-precision neural networks without Straight-Through Estimator(STE)
Learning low-precision neural networks without Straight-Through Estimator(STE) Open
The Straight-Through Estimator (STE) is widely used for back-propagating gradients through the quantization function, but the STE technique lacks a complete theoretical understanding. We propose an alternative methodology called alpha-blen…
View article: Ternary Hybrid Neural-Tree Networks for Highly Constrained IoT Applications
Ternary Hybrid Neural-Tree Networks for Highly Constrained IoT Applications Open
Machine learning-based applications are increasingly prevalent in IoT devices. The power and storage constraints of these devices make it particularly challenging to run modern neural networks, limiting the number of new applications that …