Hong-Yuan Mark Liao
YOU?
Author Swipe
View article: YOLOv1 to YOLOv10: The fastest and most accurate real-time object detection systems
YOLOv1 to YOLOv10: The fastest and most accurate real-time object detection systems Open
This is a comprehensive review of the YOLO series of systems. Different from previous literature surveys, this review article re-examines the characteristics of the YOLO series from the latest technical point of view. At the same time, we …
View article: YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information
YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information Open
Today's deep learning methods focus on how to design the most appropriate objective functions so that the prediction results of the model can be closest to the ground truth. Meanwhile, an appropriate architecture that can facilitate acquis…
View article: YOLOv1 to YOLOv10: The Fastest and Most Accurate Real-time Object Detection Systems
YOLOv1 to YOLOv10: The Fastest and Most Accurate Real-time Object Detection Systems Open
View article: YOLOR-Based Multi-Task Learning
YOLOR-Based Multi-Task Learning Open
Multi-task learning (MTL) aims to learn multiple tasks using a single model and jointly improve all of them assuming generalization and shared semantics. Reducing conflicts between tasks during joint learning is difficult and generally req…
View article: NeighborTrack: Improving Single Object Tracking by Bipartite Matching with Neighbor Tracklets
NeighborTrack: Improving Single Object Tracking by Bipartite Matching with Neighbor Tracklets Open
We propose a post-processor, called NeighborTrack, that leverages neighbor information of the tracking target to validate and improve single-object tracking (SOT) results. It requires no additional data or retraining. Instead, it uses the …
View article: Designing Network Design Strategies Through Gradient Path Analysis
Designing Network Design Strategies Through Gradient Path Analysis Open
Designing a high-efficiency and high-quality expressive network architecture has always been the most important research topic in the field of deep learning. Most of today's network design strategies focus on how to integrate features extr…
View article: SearchTrack: Multiple Object Tracking with Object-Customized Search and Motion-Aware Features
SearchTrack: Multiple Object Tracking with Object-Customized Search and Motion-Aware Features Open
The paper presents a new method, SearchTrack, for multiple object tracking and segmentation (MOTS). To address the association problem between detected objects, SearchTrack proposes object-customized search and motion-aware features. By ma…
View article: YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors
YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors Open
YOLOv7 surpasses all known object detectors in both speed and accuracy in the range from 5 FPS to 160 FPS and has the highest accuracy 56.8% AP among all known real-time object detectors with 30 FPS or higher on GPU V100. YOLOv7-E6 object …
View article: PADAr: physician-oriented artificial intelligence-facilitating diagnosis aid for retinal diseases
PADAr: physician-oriented artificial intelligence-facilitating diagnosis aid for retinal diseases Open
Purpose: Retinopathy screening via digital imaging is promising for early detection and timely treatment, and tracking retinopathic abnormality over time can help to reveal the risk of disease progression. We developed an innovative…
View article: Capturing Humans in Motion: Temporal-Attentive 3D Human Pose and Shape Estimation from Monocular Video
Capturing Humans in Motion: Temporal-Attentive 3D Human Pose and Shape Estimation from Monocular Video Open
Learning to capture human motion is essential to 3D human pose and shape estimation from monocular video. However, the existing methods mainly rely on recurrent or convolutional operation to model such temporal information, which limits th…
View article: You Only Learn One Representation: Unified Network for Multiple Tasks
You Only Learn One Representation: Unified Network for Multiple Tasks Open
People ``understand'' the world via vision, hearing, tactile, and also the past experience. Human experience can be learned through normal learning (we call it explicit knowledge), or subconsciously (we call it implicit knowledge). These e…
View article: End-to-End Key-Player-Based Group Activity Recognition Network Applied to Basketball Offensive Tactic Identification in Limited Data Scenarios
End-to-End Key-Player-Based Group Activity Recognition Network Applied to Basketball Offensive Tactic Identification in Limited Data Scenarios Open
In this paper, we propose an end-to-end key-player-based group activity recognition network specially applied to the identification of basketball offensive tactics in limited data scenarios. Our previous studies show that basketball tactic…
View article: Scaled-YOLOv4: Scaling Cross Stage Partial Network
Scaled-YOLOv4: Scaling Cross Stage Partial Network Open
We show that the YOLOv4 object detection neural network based on the CSP approach, scales both up and down and is applicable to small and large networks while maintaining optimal speed and accuracy. We propose a network scaling approach th…
View article: Batch-Augmented Multi-Agent Reinforcement Learning for Efficient Traffic Signal Optimization
Batch-Augmented Multi-Agent Reinforcement Learning for Efficient Traffic Signal Optimization Open
The goal of this work is to provide a viable solution based on reinforcement learning for traffic signal control problems. Although the state-of-the-art reinforcement learning approaches have yielded great success in a variety of domains, …
View article: YOLOv4: Optimal Speed and Accuracy of Object Detection
YOLOv4: Optimal Speed and Accuracy of Object Detection Open
There are a huge number of features which are said to improve Convolutional Neural Network (CNN) accuracy. Practical testing of combinations of such features on large datasets, and theoretical justification of the result, is required. Some…
View article: Residual Bi-Fusion Feature Pyramid Network for Accurate Single-shot Object Detection
Residual Bi-Fusion Feature Pyramid Network for Accurate Single-shot Object Detection Open
State-of-the-art (SoTA) models have improved the accuracy of object detection with a large margin via a FP (feature pyramid). FP is a top-down aggregation to collect semantically strong features to improve scale invariance in both two-stag…
View article: CSPNet: A New Backbone that can Enhance Learning Capability of CNN
CSPNet: A New Backbone that can Enhance Learning Capability of CNN Open
Neural networks have enabled state-of-the-art approaches to achieve incredible results on computer vision tasks such as object detection. However, such success greatly relies on costly computation resources, which hinders people with cheap…
View article: A New Target-specific Object Proposal Generation Method for Visual Tracking
A New Target-specific Object Proposal Generation Method for Visual Tracking Open
Object proposal generation methods have been widely applied to many computer vision tasks. However, existing object proposal generation methods often suffer from the problems of motion blur, low contrast, deformation, etc., when they are a…
View article: Automatic Image Cropping for Visual Aesthetic Enhancement Using Deep Neural Networks and Cascaded Regression
Automatic Image Cropping for Visual Aesthetic Enhancement Using Deep Neural Networks and Cascaded Regression Open
Despite recent progress, computational visual aesthetic is still challenging. Image cropping, which refers to the removal of unwanted scene areas, is an important step to improve the aesthetic quality of an image. However, it is challengin…
View article: Hierarchical Cross Network for Person Re-identification
Hierarchical Cross Network for Person Re-identification Open
Person re-identification (person re-ID) aims at matching target person(s) grabbed from different and non-overlapping camera views. It plays an important role for public safety and has application in various tasks such as, human retrieval, …