Explanipedia

YOLOX: Exceeding YOLO Series in 2021 Open

Zheng Ge, Songtao Liu, Feng Wang, Zeming Li, Jian Sun · 2021

In this report, we present some experienced improvements to YOLO series, forming a new high-performance detector -- YOLOX. We switch the YOLO detector to an anchor-free manner and conduct other advanced detection techniques, i.e., a decoup…

ENet: A Deep Neural Network Architecture for Real-Time Semantic\n Segmentation Open

Adam Paszke, Abhishek Chaurasia, Sangpil Kim, Eugenio Culurciello · 2016

Computer science Art

The ability to perform pixel-wise semantic segmentation in real-time is of\nparamount importance in mobile applications. Recent deep neural networks aimed\nat this task have the disadvantage of requiring a large number of floating\npoint o…

ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation Open

Adam Paszke, Abhishek Chaurasia, Sangpil Kim, Eugenio Culurciello · 2016

Computer science Art

The ability to perform pixel-wise semantic segmentation in real-time is of paramount importance in mobile applications. Recent deep neural networks aimed at this task have the disadvantage of requiring a large number of floating point oper…

Conversion of Continuous-Valued Deep Networks to Efficient Event-Driven Networks for Image Classification Open

Bodo Rueckauer, Iulia-Alexandra Lungu, Yuhuang Hu, Michael Pfeiffer, Shih‐Chii Liu · 2017

Computer science Sociology

Spiking neural networks (SNNs) can potentially offer an efficient way of doing inference because the neurons in the networks are sparsely activated and computations are event-driven. Previous work showed that simple continuous-valued deep …

Soft Filter Pruning for Accelerating Deep Convolutional Neural Networks Open

Yang He, Guoliang Kang, Xuanyi Dong, Yanwei Fu, Yi Yang · 2018

Computer science Biology

This paper proposed a Soft Filter Pruning (SFP) method to accelerate the inference procedure of deep Convolutional Neural Networks (CNNs). Specifically, the proposed SFP enables the pruned filters to be updated when training the model afte…

ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices Open

Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, Jian Sun · 2017

Computer science Mathematics

We introduce an extremely computation-efficient CNN architecture named ShuffleNet, which is designed specially for mobile devices with very limited computing power (e.g., 10-150 MFLOPs). The new architecture utilizes two new operations, po…

Learning Transferable Architectures for Scalable Image Recognition Open

Barret Zoph, Vijay Vasudevan, Jonathon Shlens, Quoc V. Le · 2018

Computer science Mathematics Art

Developing neural network image classification models often requires significant architecture engineering. In this paper, we study a method to learn the model architectures directly on the dataset of interest. As this approach is expensive…

EfficientDet: Scalable and Efficient Object Detection Open

Mingxing Tan, Ruoming Pang, Quoc V. Le · 2019

Computer science Physics Philosophy

Model efficiency has become increasingly important in computer vision. In this paper, we systematically study neural network architecture design choices for object detection and propose several key optimizations to improve efficiency. Firs…

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness Open

Tri Dao, Daniel Y. Fu, Stefano Ermon, Atri Rudra, Christopher Ré · 2022

Computer science

Transformers are slow and memory-hungry on long sequences, since the time and memory complexity of self-attention are quadratic in sequence length. Approximate attention methods have attempted to address this problem by trading off model q…

ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks Open

Qilong Wang, Banggu Wu, Pengfei Zhu, Peihua Li, Wangmeng Zuo , et al. · 2019

Computer science Mathematics

Recently, channel attention mechanism has demonstrated to offer great potential in improving the performance of deep convolutional neural networks (CNNs). However, most existing methods dedicate to developing more sophisticated attention m…

Visual Transformers: Token-based Image Representation and Processing for Computer Vision Open

Bichen Wu, Chenfeng Xu, Xiaoliang Dai, Alvin Wan, Peizhao Zhang , et al. · 2020

Computer science Mathematics Engineering

Computer vision has achieved remarkable success by (a) representing images as uniformly-arranged pixel arrays and (b) convolving highly-localized features. However, convolutions treat all image pixels equally regardless of importance; expl…

Early Convolutions Help Transformers See Better Open

Tete Xiao, Mannat Singh, Eric Mintun, Trevor Darrell, Piotr Dollár , et al. · 2021

Computer science Mathematics Engineering

Vision transformer (ViT) models exhibit substandard optimizability. In particular, they are sensitive to the choice of optimizer (AdamW vs. SGD), optimizer hyperparameters, and training schedule length. In comparison, modern convolutional …

DeepViT: Towards Deeper Vision Transformer Open

Daquan Zhou, Bingyi Kang, Xiaojie Jin, Linjie Yang, Xiaochen Lian , et al. · 2021

Computer science Engineering Mathematics

Vision transformers (ViTs) have been successfully applied in image classification tasks recently. In this paper, we show that, unlike convolution neural networks (CNNs)that can be improved by stacking more convolutional layers, the perform…

Residual Attention Network for Image Classification Open

Fei Wang, Mengqing Jiang, Chen Qian, Shuo Yang, Cheng Li , et al. · 2017

Computer science Engineering Geography

In this work, we propose "Residual Attention Network", a convolutional neural network using attention mechanism which can incorporate with state-of-art feed forward network architecture in an end-to-end training fashion. Our Residual Atten…

Deep Roots: Improving CNN Efficiency with Hierarchical Filter Groups Open

Yani Ioannou, Duncan Robertson, Roberto Cipolla, Antonio Criminisi · 2017

Computer science Mathematics

CVPR 2017 Poster for "Deep Roots: Improving CNN Efficiency With Hierarchical Filter Groups", to be presented at conference

MixConv: Mixed Depthwise Convolutional Kernels Open

Mingxing Tan, Quoc V. Le · 2019

Computer science Mathematics Philosophy

Depthwise convolution is becoming increasingly popular in modern efficient ConvNets, but its kernel size is often overlooked. In this paper, we systematically study the impact of different kernel sizes, and observe that combining the benef…

BEVDet: High-performance Multi-camera 3D Object Detection in Bird-Eye-View Open

Junjie Huang, Guan Huang, Zheng Zhu, Yun Ye, Dalong Du · 2021

Computer science Economics Biology

Autonomous driving perceives its surroundings for decision making, which is one of the most complex scenarios in visual perception. The success of paradigm innovation in solving the 2D object detection task inspires us to seek an elegant, …

GhostNetV2: Enhance Cheap Operation with Long-Range Attention Open

Yehui Tang, Kai Han, Jianyuan Guo, Chang Xu, Chao Xu , et al. · 2022

Computer science Mathematics Materials science

Light-weight convolutional neural networks (CNNs) are specially designed for applications on mobile devices with faster inference speed. The convolutional operation can only capture local information in a window region, which prevents perf…

EfficientFormer: Vision Transformers at MobileNet Speed Open

Yanyu Li, Geng Yuan, Yang Wen, Eric Hu, Georgios Evangelidis , et al. · 2022

Computer science Art Engineering

Vision Transformers (ViT) have shown rapid progress in computer vision tasks, achieving promising results on various benchmarks. However, due to the massive number of parameters and model design, \textit{e.g.}, attention mechanism, ViT-bas…

SlimYOLOv3: Narrower, Faster and Better for Real-Time UAV Applications Open

Pengyi Zhang, Yunxin Zhong, Xiaoqiong Li · 2019

Computer science Philosophy Geography

Drones or general Unmanned Aerial Vehicles (UAVs), endowed with computer vision function by on-board cameras and embedded systems, have become popular in a wide range of applications. However, real-time scene parsing through object detecti…

Single Path One-Shot Neural Architecture Search with Uniform Sampling Open

Zichao Guo, Xiangyu Zhang, Haoyuan Mu, Wen Heng, Zechun Liu , et al. · 2019

Computer science Art

We revisit the one-shot Neural Architecture Search (NAS) paradigm and analyze its advantages over existing NAS approaches. Existing one-shot method, however, is hard to train and not yet effective on large scale datasets like ImageNet. Thi…

PP-YOLO: An Effective and Efficient Implementation of Object Detector Open

Xiang Long, Kaipeng Deng, Guanzhong Wang, Yang Zhang, Qingqing Dang , et al. · 2020

Computer science

Object detection is one of the most important areas in computer vision, which plays a key role in various practical scenarios. Due to limitation of hardware, it is often necessary to sacrifice accuracy to ensure the infer speed of the dete…

ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design Open

Ningning Ma, Xiangyu Zhang, Hai-Tao Zheng, Jian Sun · 2018

Computer science Engineering Economics

Currently, the neural network architecture design is mostly guided by the \emph{indirect} metric of computation complexity, i.e., FLOPs. However, the \emph{direct} metric, e.g., speed, also depends on the other factors such as memory acces…

UNETR++: Delving Into Efficient and Accurate 3D Medical Image Segmentation Open

Abdelrahman Shaker, Muhammad Maaz, Hanoona Rasheed, Salman Khan, Ming–Hsuan Yang , et al. · 2024

Computer science Mathematics

Owing to the success of transformer models, recent works study their applicability in 3D medical segmentation tasks. Within the transformer models, the self-attention mechanism is one of the main building blocks that strives to capture lon…

Stand-Alone Self-Attention in Vision Models Open

Prajit Ramachandran, Niki Parmar, Ashish Vaswani, Irwan Bello, Anselm Levskaya , et al. · 2019

Computer science Mathematics Physics

Convolutions are a fundamental building block of modern computer vision systems. Recent approaches have argued for going beyond convolutions in order to capture long-range dependencies. These efforts focus on augmenting convolutional model…

Efficient Transformer for Remote Sensing Image Segmentation Open

Zhiyong Xu, Weicun Zhang, Tianxiang Zhang, Zhifang Yang, Jiangyun Li · 2021

Computer science Engineering

Semantic segmentation for remote sensing images (RSIs) is widely applied in geological surveys, urban resources management, and disaster monitoring. Recent solutions on remote sensing segmentation tasks are generally addressed by CNN-based…

AutoCompress: An Automatic DNN Structured Pruning Framework for Ultra-High Compression Rates Open

Ning Liu, Xiaolong Ma, Zhiyuan Xu, Yanzhi Wang, Jian Tang , et al. · 2020

Computer science Mathematics Biology

Structured weight pruning is a representative model compression technique of DNNs to reduce the storage and computation requirements and accelerate inference. An automatic hyperparameter determination process is necessary due to the large …

Asymptotic Soft Filter Pruning for Deep Convolutional Neural Networks Open

Yang He, Xuanyi Dong, Guoliang Kang, Yanwei Fu, Chenggang Yan , et al. · 2019

Computer science Biology

Deeper and wider convolutional neural networks (CNNs) achieve superior performance but bring expensive computation cost. Accelerating such overparameterized neural network has received increased attention. A typical pruning algorithm is a …

SF-YOLOv5: A Lightweight Small Object Detection Algorithm Based on Improved Feature Fusion Mode Open

Haiying Liu, Fengqian Sun, Jason Gu, Lixia Deng · 2022

Computer science Mathematics Physics

In the research of computer vision, a very challenging problem is the detection of small objects. The existing detection algorithms often focus on detecting full-scale objects, without making proprietary optimization for detecting small-si…

CvT: Introducing Convolutions to Vision Transformers Open

Haiping Wu, Bin Xiao, Noel Codella, Mengchen Liu, Xiyang Dai , et al. · 2021

Computer science Engineering Art

We present in this paper a new architecture, named Convolutional vision Transformer (CvT), that improves Vision Transformer (ViT) in performance and efficiency by introducing convolutions into ViT to yield the best of both designs. This is…

FLOPS