FLOPS
View article
YOLOX: Exceeding YOLO Series in 2021 Open
In this report, we present some experienced improvements to YOLO series, forming a new high-performance detector -- YOLOX. We switch the YOLO detector to an anchor-free manner and conduct other advanced detection techniques, i.e., a decoup…
View article
ENet: A Deep Neural Network Architecture for Real-Time Semantic\n Segmentation Open
The ability to perform pixel-wise semantic segmentation in real-time is of\nparamount importance in mobile applications. Recent deep neural networks aimed\nat this task have the disadvantage of requiring a large number of floating\npoint o…
View article
ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation Open
The ability to perform pixel-wise semantic segmentation in real-time is of paramount importance in mobile applications. Recent deep neural networks aimed at this task have the disadvantage of requiring a large number of floating point oper…
View article
Conversion of Continuous-Valued Deep Networks to Efficient Event-Driven Networks for Image Classification Open
Spiking neural networks (SNNs) can potentially offer an efficient way of doing inference because the neurons in the networks are sparsely activated and computations are event-driven. Previous work showed that simple continuous-valued deep …
View article
Soft Filter Pruning for Accelerating Deep Convolutional Neural Networks Open
This paper proposed a Soft Filter Pruning (SFP) method to accelerate the inference procedure of deep Convolutional Neural Networks (CNNs). Specifically, the proposed SFP enables the pruned filters to be updated when training the model afte…
View article
ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices Open
We introduce an extremely computation-efficient CNN architecture named ShuffleNet, which is designed specially for mobile devices with very limited computing power (e.g., 10-150 MFLOPs). The new architecture utilizes two new operations, po…
View article
Learning Transferable Architectures for Scalable Image Recognition Open
Developing neural network image classification models often requires significant architecture engineering. In this paper, we study a method to learn the model architectures directly on the dataset of interest. As this approach is expensive…
View article
EfficientDet: Scalable and Efficient Object Detection Open
Model efficiency has become increasingly important in computer vision. In this paper, we systematically study neural network architecture design choices for object detection and propose several key optimizations to improve efficiency. Firs…
View article
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness Open
Transformers are slow and memory-hungry on long sequences, since the time and memory complexity of self-attention are quadratic in sequence length. Approximate attention methods have attempted to address this problem by trading off model q…
View article
ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks Open
Recently, channel attention mechanism has demonstrated to offer great potential in improving the performance of deep convolutional neural networks (CNNs). However, most existing methods dedicate to developing more sophisticated attention m…
View article
Visual Transformers: Token-based Image Representation and Processing for Computer Vision Open
Computer vision has achieved remarkable success by (a) representing images as uniformly-arranged pixel arrays and (b) convolving highly-localized features. However, convolutions treat all image pixels equally regardless of importance; expl…
View article
Early Convolutions Help Transformers See Better Open
Vision transformer (ViT) models exhibit substandard optimizability. In particular, they are sensitive to the choice of optimizer (AdamW vs. SGD), optimizer hyperparameters, and training schedule length. In comparison, modern convolutional …
View article
DeepViT: Towards Deeper Vision Transformer Open
Vision transformers (ViTs) have been successfully applied in image classification tasks recently. In this paper, we show that, unlike convolution neural networks (CNNs)that can be improved by stacking more convolutional layers, the perform…
View article
Residual Attention Network for Image Classification Open
In this work, we propose "Residual Attention Network", a convolutional neural network using attention mechanism which can incorporate with state-of-art feed forward network architecture in an end-to-end training fashion. Our Residual Atten…
View article
Deep Roots: Improving CNN Efficiency with Hierarchical Filter Groups Open
CVPR 2017 Poster for "Deep Roots: Improving CNN Efficiency With Hierarchical Filter Groups", to be presented at conference
View article
MixConv: Mixed Depthwise Convolutional Kernels Open
Depthwise convolution is becoming increasingly popular in modern efficient ConvNets, but its kernel size is often overlooked. In this paper, we systematically study the impact of different kernel sizes, and observe that combining the benef…
View article
BEVDet: High-performance Multi-camera 3D Object Detection in Bird-Eye-View Open
Autonomous driving perceives its surroundings for decision making, which is one of the most complex scenarios in visual perception. The success of paradigm innovation in solving the 2D object detection task inspires us to seek an elegant, …
View article
GhostNetV2: Enhance Cheap Operation with Long-Range Attention Open
Light-weight convolutional neural networks (CNNs) are specially designed for applications on mobile devices with faster inference speed. The convolutional operation can only capture local information in a window region, which prevents perf…
View article
EfficientFormer: Vision Transformers at MobileNet Speed Open
Vision Transformers (ViT) have shown rapid progress in computer vision tasks, achieving promising results on various benchmarks. However, due to the massive number of parameters and model design, \textit{e.g.}, attention mechanism, ViT-bas…
View article
SlimYOLOv3: Narrower, Faster and Better for Real-Time UAV Applications Open
Drones or general Unmanned Aerial Vehicles (UAVs), endowed with computer vision function by on-board cameras and embedded systems, have become popular in a wide range of applications. However, real-time scene parsing through object detecti…
View article
Single Path One-Shot Neural Architecture Search with Uniform Sampling Open
We revisit the one-shot Neural Architecture Search (NAS) paradigm and analyze its advantages over existing NAS approaches. Existing one-shot method, however, is hard to train and not yet effective on large scale datasets like ImageNet. Thi…
View article
PP-YOLO: An Effective and Efficient Implementation of Object Detector Open
Object detection is one of the most important areas in computer vision, which plays a key role in various practical scenarios. Due to limitation of hardware, it is often necessary to sacrifice accuracy to ensure the infer speed of the dete…
View article
ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design Open
Currently, the neural network architecture design is mostly guided by the \emph{indirect} metric of computation complexity, i.e., FLOPs. However, the \emph{direct} metric, e.g., speed, also depends on the other factors such as memory acces…
View article
UNETR++: Delving Into Efficient and Accurate 3D Medical Image Segmentation Open
Owing to the success of transformer models, recent works study their applicability in 3D medical segmentation tasks. Within the transformer models, the self-attention mechanism is one of the main building blocks that strives to capture lon…
View article
Stand-Alone Self-Attention in Vision Models Open
Convolutions are a fundamental building block of modern computer vision systems. Recent approaches have argued for going beyond convolutions in order to capture long-range dependencies. These efforts focus on augmenting convolutional model…
View article
Efficient Transformer for Remote Sensing Image Segmentation Open
Semantic segmentation for remote sensing images (RSIs) is widely applied in geological surveys, urban resources management, and disaster monitoring. Recent solutions on remote sensing segmentation tasks are generally addressed by CNN-based…
View article
AutoCompress: An Automatic DNN Structured Pruning Framework for Ultra-High Compression Rates Open
Structured weight pruning is a representative model compression technique of DNNs to reduce the storage and computation requirements and accelerate inference. An automatic hyperparameter determination process is necessary due to the large …
View article
Asymptotic Soft Filter Pruning for Deep Convolutional Neural Networks Open
Deeper and wider convolutional neural networks (CNNs) achieve superior performance but bring expensive computation cost. Accelerating such overparameterized neural network has received increased attention. A typical pruning algorithm is a …
View article
SF-YOLOv5: A Lightweight Small Object Detection Algorithm Based on Improved Feature Fusion Mode Open
In the research of computer vision, a very challenging problem is the detection of small objects. The existing detection algorithms often focus on detecting full-scale objects, without making proprietary optimization for detecting small-si…
View article
CvT: Introducing Convolutions to Vision Transformers Open
We present in this paper a new architecture, named Convolutional vision Transformer (CvT), that improves Vision Transformer (ViT) in performance and efficiency by introducing convolutions into ViT to yield the best of both designs. This is…