Speedup ≈ Speedup
View article: UNet++: Redesigning Skip Connections to Exploit Multiscale Features in Image Segmentation
UNet++: Redesigning Skip Connections to Exploit Multiscale Features in Image Segmentation Open
The state-of-the-art models for medical image segmentation are variants of U-Net and fully convolutional networks (FCN). Despite their success, these models have two limitations: (1) their optimal depth is apriori unknown, requiring extens…
View article
Instant neural graphics primitives with a multiresolution hash encoding Open
Neural graphics primitives, parameterized by fully connected neural networks, can be costly to train and evaluate. We reduce this cost with a versatile new input encoding that permits the use of a smaller network without sacrificing qualit…
View article
Simplifying Graph Convolutional Networks Open
Graph Convolutional Networks (GCNs) and their variants have experienced significant attention and have become the de facto methods for learning graph representations. GCNs derive inspiration primarily from recent deep learning approaches, …
View article
EfficientNetV2: Smaller Models and Faster Training Open
This paper introduces EfficientNetV2, a new family of convolutional networks that have faster training speed and better parameter efficiency than previous models. To develop this family of models, we use a combination of training-aware neu…
View article
Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization Open
Performance of machine learning algorithms depends critically on identifying a good set of hyperparameters. While recent approaches use Bayesian optimization to adaptively select configurations, we focus on speeding up random search throug…
View article
Survey of Multifidelity Methods in Uncertainty Propagation, Inference, and Optimization Open
In many situations across computational science and engineering, multiple computational models are available that describe a system of interest. These different models have varying evaluation costs and varying fidelities. Typically, a comp…
View article
ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices Open
We introduce an extremely computation-efficient CNN architecture named ShuffleNet, which is designed specially for mobile devices with very limited computing power (e.g., 10-150 MFLOPs). The new architecture utilizes two new operations, po…
View article
Mixed Precision Training Open
Deep neural networks have enabled progress in a wide variety of applications. Growing the size of the neural network typically results in improved accuracy. As model sizes grow, the memory and compute requirements for training these models…
View article
Learning to Optimize: Training Deep Neural Networks for Interference Management Open
For the past couple of decades, numerical optimization has played a central role in addressing wireless resource management problems such as power control and beamformer design. However, optimization algorithms often entail considerable co…
View article
DeepTrust^RT: Confidential Deep Neural Inference Meets Real-Time! Open
Deep Neural Networks (DNNs) are becoming common in "learning-enabled" time-critical applications such as autonomous driving and robotics. One approach to protect DNN inference from adversarial actions and preserve model privacy/confidentia…
View article
Quantizing deep convolutional networks for efficient inference: A whitepaper Open
We present an overview of techniques for quantizing convolutional neural networks for inference with integer weights and activations. Per-channel quantization of weights and per-layer quantization of activations to 8-bits of precision post…
View article
Snorkel Open
Labeling training data is increasingly the largest bottleneck in deploying machine learning systems. We present Snorkel, a first-of-its-kind system that enables users to train state-of-the-art models without hand labeling any training data…
View article
Direct Training for Spiking Neural Networks: Faster, Larger, Better Open
Spiking neural networks (SNNs) that enables energy efficient implementation on emerging neuromorphic hardware are gaining more attention. Yet now, SNNs have not shown competitive performance compared with artificial neural networks (ANNs),…
View article
Collaborative Metric Learning Open
Metric learning algorithms produce distance metrics that capture the important relationships among data. In this work, we study the connection between metric learning and collaborative filtering. We propose Collaborative Metric Learning (C…
View article
GPyTorch: Blackbox Matrix-Matrix Gaussian Process Inference with GPU Acceleration Open
Despite advances in scalable models, the inference tools used for Gaussian processes (GPs) have yet to fully capitalize on developments in computing hardware. We present an efficient and general approach to GP inference based on Blackbox M…
View article
Tackling the Qubit Mapping Problem for NISQ-Era Quantum Devices Open
Due to little consideration in the hardware constraints, e.g., limited connections between physical qubits to enable twoqubit gates, most quantum algorithms cannot be directly executed on the Noisy Intermediate-Scale Quantum (NISQ) devices…
View article
Toward the first quantum simulation with quantum speedup Open
With quantum computers of significant size now on the horizon, we should understand how to best exploit their initially limited abilities. To this end, we aim to identify a practical problem that is beyond the reach of current classical co…
View article
Large Batch Training of Convolutional Networks Open
A common way to speed up training of large convolutional networks is to add computational units. Training is then performed using data-parallel synchronous Stochastic Gradient Descent (SGD) with mini-batch divided between computational uni…
View article
Modeling Site Heterogeneity with Posterior Mean Site Frequency Profiles Accelerates Accurate Phylogenomic Estimation Open
Proteins have distinct structural and functional constraints at different sites that lead to site-specific preferences for particular amino acid residues as the sequences evolve. Heterogeneity in the amino acid substitution process between…
View article
Parallel Restarted SGD with Faster Convergence and Less Communication: Demystifying Why Model Averaging Works for Deep Learning Open
In distributed training of deep neural networks, parallel minibatch SGD is widely used to speed up the training process by using multiple workers. It uses multiple workers to sample local stochastic gradients in parallel, aggregates all gr…
View article
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness Open
Transformers are slow and memory-hungry on long sequences, since the time and memory complexity of self-attention are quadratic in sequence length. Approximate attention methods have attempted to address this problem by trading off model q…
View article
Understanding of Object Detection Based on CNN Family and YOLO Open
As a key use of image processing, object detection has boomed along with the unprecedented advancement of Convolutional Neural Network (CNN) and its variants since 2012. When CNN series develops to Faster Region with CNN (R-CNN), the Mean …
View article
A Parallel Random Forest Algorithm for Big Data in a Spark Cloud Computing Environment Open
With the emergence of the big data age, the issue of how to obtain valuable\nknowledge from a dataset efficiently and accurately has attracted increasingly\nattention from both academia and industry. This paper presents a Parallel\nRandom …
View article
A Frustratingly Easy Approach for Entity and Relation Extraction Open
End-to-end relation extraction aims to identify named entities and extract relations between them. Most recent work models these two subtasks jointly, either by casting them in one structured prediction framework, or performing multi-task …
View article
Physics-Inspired Optimization for Quadratic Unconstrained Problems Using a Digital Annealer Open
The Fujitsu Digital Annealer (DA) is designed to solve fully connected\nquadratic unconstrained binary optimization (QUBO) problems. It is implemented\non application-specific CMOS hardware and currently solves problems of up to\n1024 vari…
View article
Quantum optimization of maximum independent set using Rydberg atom arrays Open
Realizing quantum speedup for practically relevant, computationally hard problems is a central challenge in quantum information science. Using Rydberg atom arrays with up to 289 qubits in two spatial dimensions, we experimentally investiga…
View article
An improved chain of spheres for exchange algorithm Open
In the present work, we describe a more accurate and efficient variant of the chain-of-spheres algorithm (COSX) for exchange matrix computations. Higher accuracy for the numerical integration is obtained with new grids that were developed …
View article
AMC: AutoML for Model Compression and Acceleration on Mobile Devices Open
Model compression is a critical technique to efficiently deploy neural network models on mobile devices which have limited computation resources and tight power budgets. Conventional model compression techniques rely on hand-crafted heuris…
View article
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity Open
In deep learning, models typically reuse the same parameters for all inputs. Mixture of Experts (MoE) defies this and instead selects different parameters for each incoming example. The result is a sparsely-activated model -- with outrageo…
View article
Fast YOLO: A Fast You Only Look Once System for Real-time Embedded Object Detection in Video Open
Object detection is considered one of the most challenging problemsin this field of computer vision, as it involves the combinationof object classification and object localization within a scene. Recently,deep neural networks (DNNs) have b…