Guangliang Cheng
YOU?
Author Swipe
View article: ViTaL: A Multimodality Dataset and Benchmark for Multi-pathological Ovarian Tumor Recognition
ViTaL: A Multimodality Dataset and Benchmark for Multi-pathological Ovarian Tumor Recognition Open
Ovarian tumor, as a common gynecological disease, can rapidly deteriorate into serious health crises when undetected early, thus posing significant threats to the health of women. Deep neural networks have the potential to identify ovarian…
View article: DUAL: Dynamic Uncertainty-Aware Learning
DUAL: Dynamic Uncertainty-Aware Learning Open
Deep learning models frequently encounter feature uncertainty in diverse learning scenarios, significantly impacting their performance and reliability. This challenge is particularly complex in multi-modal scenarios, where models must inte…
View article: Preference Alignment on Diffusion Model: A Comprehensive Survey for Image Generation and Editing
Preference Alignment on Diffusion Model: A Comprehensive Survey for Image Generation and Editing Open
The integration of preference alignment with diffusion models (DMs) has emerged as a transformative approach to enhance image generation and editing capabilities. Although integrating diffusion models with preference alignment strategies p…
View article: Position: Towards a Responsible LLM-empowered Multi-Agent Systems
Position: Towards a Responsible LLM-empowered Multi-Agent Systems Open
The rise of Agent AI and Large Language Model-powered Multi-Agent Systems (LLM-MAS) has underscored the need for responsible and dependable system operation. Tools like LangChain and Retrieval-Augmented Generation have expanded LLM capabil…
View article: BEARD: Benchmarking the Adversarial Robustness for Dataset Distillation
BEARD: Benchmarking the Adversarial Robustness for Dataset Distillation Open
Dataset Distillation (DD) is an emerging technique that compresses large-scale datasets into significantly smaller synthesized datasets while preserving high test performance and enabling the efficient training of large models. However, cu…
View article: Joint-Optimized Unsupervised Adversarial Domain Adaptation in Remote Sensing Segmentation with Prompted Foundation Model
Joint-Optimized Unsupervised Adversarial Domain Adaptation in Remote Sensing Segmentation with Prompted Foundation Model Open
Unsupervised Domain Adaptation for Remote Sensing Semantic Segmentation (UDA-RSSeg) addresses the challenge of adapting a model trained on source domain data to target domain samples, thereby minimizing the need for annotated data across d…
View article: Shape-Dependent Dynamic Label Assignment for Oriented Remote Sensing Object Detection
Shape-Dependent Dynamic Label Assignment for Oriented Remote Sensing Object Detection Open
Oriented remote sensing object detection (ORSOD) has gained increasing significance in both military and civilian applications due to the necessity of accurately identifying objects with varying shapes and orientations in remote sensing da…
View article: Robust RL with LLM-Driven Data Synthesis and Policy Adaptation for Autonomous Driving
Robust RL with LLM-Driven Data Synthesis and Policy Adaptation for Autonomous Driving Open
The integration of Large Language Models (LLMs) into autonomous driving systems demonstrates strong common sense and reasoning abilities, effectively addressing the pitfalls of purely data-driven methods. Current LLM-based agents require l…
View article: OVGNet: A Unified Visual-Linguistic Framework for Open-Vocabulary Robotic Grasping
OVGNet: A Unified Visual-Linguistic Framework for Open-Vocabulary Robotic Grasping Open
Recognizing and grasping novel-category objects remains a crucial yet challenging problem in real-world robotic applications. Despite its significance, limited research has been conducted in this specific domain. To address this, we seamle…
View article: MSTF: Multiscale Transformer for Incomplete Trajectory Prediction
MSTF: Multiscale Transformer for Incomplete Trajectory Prediction Open
Motion forecasting plays a pivotal role in autonomous driving systems, enabling vehicles to execute collision warnings and rational local-path planning based on predictions of the surrounding vehicles. However, prevalent methods often assu…
View article: BACON: Bayesian Optimal Condensation Framework for Dataset Distillation
BACON: Bayesian Optimal Condensation Framework for Dataset Distillation Open
Dataset Distillation (DD) aims to distill knowledge from extensive datasets into more compact ones while preserving performance on the test set, thereby reducing storage costs and training expenses. However, existing methods often suffer f…
View article: Efficient Decoder and Intermediate Domain for Semantic Segmentation in Adverse Conditions
Efficient Decoder and Intermediate Domain for Semantic Segmentation in Adverse Conditions Open
In smart city contexts, traditional methods for semantic segmentation are affected by adverse conditions, such as rain, fog, or darkness. One challenge is the limited availability of semantic segmentation datasets, specifically for autonom…
View article: Self-training guided disentangled adaptation for cross-domain remote sensing image semantic segmentation
Self-training guided disentangled adaptation for cross-domain remote sensing image semantic segmentation Open
Remote sensing (RS) image semantic segmentation using deep convolutional neural networks (DCNNs) has shown great success in various applications. However, the high dependence on annotated data makes it challenging for DCNNs to adapt to dif…
View article: DST-Det: Simple Dynamic Self-Training for Open-Vocabulary Object Detection
DST-Det: Simple Dynamic Self-Training for Open-Vocabulary Object Detection Open
Open-vocabulary object detection (OVOD) aims to detect the objects beyond the set of classes observed during training. This work introduces a straightforward and efficient strategy that utilizes pre-trained vision-language models (VLM), li…
View article: Sfnet: Faster and Accurate Semantic Segmentation Via Semantic Flow
Sfnet: Faster and Accurate Semantic Segmentation Via Semantic Flow Open
In this paper, we focus on exploring effective methods for faster and accurate semantic segmentation. A common practice to improve the performance is to attain high-resolution feature maps with strong semantic representation. Two strategie…
View article: Learn by Oneself: Exploiting Weight-Sharing Potential in Knowledge Distillation Guided Ensemble Network
Learn by Oneself: Exploiting Weight-Sharing Potential in Knowledge Distillation Guided Ensemble Network Open
Recent CNNs (convolutional neural networks) have become more and more compact. The elegant structure design highly improves the performance of CNNs. With the development of knowledge distillation technique, the performance of CNNs gets fur…
View article: Tube-Link: A Flexible Cross Tube Framework for Universal Video Segmentation
Tube-Link: A Flexible Cross Tube Framework for Universal Video Segmentation Open
Video segmentation aims to segment and track every pixel in diverse scenarios accurately. In this paper, we present Tube-Link, a versatile framework that addresses multiple core tasks of video segmentation with a unified architecture. Our …
View article: Local-to-Global Information Communication for Real-Time Semantic Segmentation Network Search
Local-to-Global Information Communication for Real-Time Semantic Segmentation Network Search Open
Neural Architecture Search (NAS) has shown great potentials in automatically designing neural network architectures for real-time semantic segmentation. Unlike previous works that utilize a simplified search space with cell-sharing way, we…
View article: PanopticPartFormer++: A Unified and Decoupled View for Panoptic Part Segmentation
PanopticPartFormer++: A Unified and Decoupled View for Panoptic Part Segmentation Open
Panoptic Part Segmentation (PPS) unifies panoptic and part segmentation into one task. Previous works utilize separate approaches to handle things, stuff, and part predictions without shared computation and task association. We aim to unif…
View article: TransVOD: End-to-End Video Object Detection With Spatial-Temporal Transformers
TransVOD: End-to-End Video Object Detection With Spatial-Temporal Transformers Open
Detection Transformer (DETR) and Deformable DETR have been proposed to eliminate the need for many hand-designed components in object detection while demonstrating good performance as previous complex hand-crafted detectors. However, their…
View article: Reconstruct from BEV: A 3D Lane Detection Approach based on Geometry Structure Prior
Reconstruct from BEV: A 3D Lane Detection Approach based on Geometry Structure Prior Open
In this paper, we propose an advanced approach in targeting the problem of monocular 3D lane detection by leveraging geometry structure underneath the process of 2D to 3D lane reconstruction. Inspired by previous methods, we first analyze …
View article: Multi-level Domain Adaptation for Lane Detection
Multi-level Domain Adaptation for Lane Detection Open
We focus on bridging domain discrepancy in lane detection among different scenarios to greatly reduce extra annotation and re-training costs for autonomous driving. Critical factors hinder the performance improvement of cross-domain lane d…
View article: Fashionformer: A simple, Effective and Unified Baseline for Human Fashion Segmentation and Recognition
Fashionformer: A simple, Effective and Unified Baseline for Human Fashion Segmentation and Recognition Open
Human fashion understanding is one crucial computer vision task since it has comprehensive information for real-world applications. This focus on joint human fashion segmentation and attribute recognition. Contrary to the previous works th…
View article: Video K-Net: A Simple, Strong, and Unified Baseline for Video Segmentation
Video K-Net: A Simple, Strong, and Unified Baseline for Video Segmentation Open
This paper presents Video K-Net, a simple, strong, and unified framework for fully end-to-end video panoptic segmentation. The method is built upon K-Net, a method that unifies image segmentation via a group of learnable kernels. We observ…
View article: PIT: Position-Invariant Transform for Cross-FoV Domain Adaptation
PIT: Position-Invariant Transform for Cross-FoV Domain Adaptation Open
Cross-domain object detection and semantic segmentation have witnessed impressive progress recently. Existing approaches mainly consider the domain shift resulting from external environments including the changes of background, illuminatio…