Dongbao Yang
YOU?
Author Swipe
View article: The Role of Video Generation in Enhancing Data-Limited Action Understanding
The Role of Video Generation in Enhancing Data-Limited Action Understanding Open
Video action understanding tasks in real-world scenarios always suffer data limitations. In this paper, we address the data-limited action understanding problem by bridging data scarcity. We propose a novel method that employs a text-to-vi…
View article: DCA: Dividing and Conquering Amnesia in Incremental Object Detection
DCA: Dividing and Conquering Amnesia in Incremental Object Detection Open
Incremental object detection (IOD) aims to cultivate an object detector that can continuously localize and recognize novel classes while preserving its performance on previous classes. Existing methods achieve certain success by improving …
View article: Specifying What You Know or Not for Multi-Label Class-Incremental Learning
Specifying What You Know or Not for Multi-Label Class-Incremental Learning Open
Existing class incremental learning is mainly designed for single-label classification task, which is ill-equipped for multi-label scenarios due to the inherent contradiction of learning objectives for samples with incomplete labels. We ar…
View article: Arbitrary Reading Order Scene Text Spotter with Local Semantics Guidance
Arbitrary Reading Order Scene Text Spotter with Local Semantics Guidance Open
Scene text spotting has attracted the enthusiasm of relative researchers in recent years. Most existing scene text spotters follow the detection-then-recognition paradigm, where the vanilla detection module hardly determines the reading or…
View article: DCA: Dividing and Conquering Amnesia in Incremental Object Detection
DCA: Dividing and Conquering Amnesia in Incremental Object Detection Open
Incremental object detection (IOD) aims to cultivate an object detector that can continuously localize and recognize novel classes while preserving its performance on previous classes. Existing methods achieve certain success by improving …
View article: Robust Multimodal Sentiment Analysis of Image-Text Pairs by Distribution-Based Feature Recovery and Fusion
Robust Multimodal Sentiment Analysis of Image-Text Pairs by Distribution-Based Feature Recovery and Fusion Open
As posts on social media increase rapidly, analyzing the sentiments embedded in image-text pairs has become a popular research topic in recent years. Although existing works achieve impressive accomplishments in simultaneously harnessing i…
View article: Bridging Visual Affective Gap: Borrowing Textual Knowledge by Learning from Noisy Image-Text Pairs
Bridging Visual Affective Gap: Borrowing Textual Knowledge by Learning from Noisy Image-Text Pairs Open
Visual emotion recognition (VER) is a longstanding field that has garnered increasing attention with the advancement of deep neural networks. Although recent studies have achieved notable improvements by leveraging the knowledge embedded w…
View article: First Creating Backgrounds Then Rendering Texts: A New Paradigm for Visual Text Blending
First Creating Backgrounds Then Rendering Texts: A New Paradigm for Visual Text Blending Open
Diffusion models, known for their impressive image generation abilities, have played a pivotal role in the rise of visual text generation. Nevertheless, existing visual text generation methods often focus on generating entire images with t…
View article: TextCtrl: Diffusion-based Scene Text Editing with Prior Guidance Control
TextCtrl: Diffusion-based Scene Text Editing with Prior Guidance Control Open
Centred on content modification and style preservation, Scene Text Editing (STE) remains a challenging task despite considerable progress in text-to-image synthesis and text-driven image manipulation recently. GAN-based STE methods general…
View article: First Creating Backgrounds Then Rendering Texts: A New Paradigm for Visual Text Blending
First Creating Backgrounds Then Rendering Texts: A New Paradigm for Visual Text Blending Open
Diffusion models, known for their impressive image generation abilities, have played a pivotal role in the rise of visual text generation. Nevertheless, existing visual text generation methods often focus on generating entire images with t…
View article: Focus, Distinguish, and Prompt: Unleashing CLIP for Efficient and Flexible Scene Text Retrieval
Focus, Distinguish, and Prompt: Unleashing CLIP for Efficient and Flexible Scene Text Retrieval Open
Scene text retrieval aims to find all images containing the query text from an image gallery. Current efforts tend to adopt an Optical Character Recognition (OCR) pipeline, which requires complicated text detection and/or recognition proce…
View article: Resolving Sentiment Discrepancy for Multimodal Sentiment Detection via Semantics Completion and Decomposition
Resolving Sentiment Discrepancy for Multimodal Sentiment Detection via Semantics Completion and Decomposition Open
With the proliferation of social media posts in recent years, the need to detect sentiments in multimodal (image-text) content has grown rapidly. Since posts are user-generated, the image and text from the same post can express different o…
View article: Pseudo Object Replay and Mining for Incremental Object Detection
Pseudo Object Replay and Mining for Incremental Object Detection Open
Incremental object detection (IOD) aims to mitigate catastrophic forgetting for object detectors when incrementally learning to detect new emerging object classes without using original training data. Most existing IOD methods benefit from…
View article: Perceiving Ambiguity and Semantics without Recognition: An Efficient and Effective Ambiguous Scene Text Detector
Perceiving Ambiguity and Semantics without Recognition: An Efficient and Effective Ambiguous Scene Text Detector Open
Ambiguous scene text detection is an extremely challenging task. Existing text detectors that rely solely on visual cues often suffer from confusion due to being evenly distributed in rows/columns or incomplete detection owing to large cha…
View article: One-Shot Replay: Boosting Incremental Object Detection via Retrospecting One Object
One-Shot Replay: Boosting Incremental Object Detection via Retrospecting One Object Open
Modern object detectors are ill-equipped to incrementally learn new emerging object classes over time due to the well-known phenomenon of catastrophic forgetting. Due to data privacy or limited storage, few or no images of the old data can…
View article: Masked and Permuted Implicit Context Learning for Scene Text Recognition
Masked and Permuted Implicit Context Learning for Scene Text Recognition Open
Scene Text Recognition (STR) is difficult because of the variations in text styles, shapes, and backgrounds. Though the integration of linguistic information enhances models' performance, existing methods based on either permuted language …
View article: Multi-View Correlation Distillation for Incremental Object Detection
Multi-View Correlation Distillation for Incremental Object Detection Open
In real applications, new object classes often emerge after the detection model has been trained on a prepared dataset with fixed classes. Due to the storage burden and the privacy of old data, sometimes it is impractical to train the mode…
View article: Two-Level Residual Distillation based Triple Network for Incremental Object Detection
Two-Level Residual Distillation based Triple Network for Incremental Object Detection Open
Modern object detection methods based on convolutional neural network suffer from severe catastrophic forgetting in learning new classes without original data. Due to time consumption, storage burden and privacy of old data, it is inadvisa…
View article: Self-Training for Domain Adaptive Scene Text Detection
Self-Training for Domain Adaptive Scene Text Detection Open
Though deep learning based scene text detection has achieved great progress, well-trained detectors suffer from severe performance degradation for different domains. In general, a tremendous amount of data is indispensable to train the det…
View article: SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition
SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition Open
Scene text recognition is a hot research topic in computer vision. Recently, many recognition methods based on the encoder-decoder framework have been proposed, and they can handle scene texts of perspective distortion and curve shape. Nev…
View article: Video Cloze Procedure for Self-Supervised Spatio-Temporal Learning
Video Cloze Procedure for Self-Supervised Spatio-Temporal Learning Open
We propose a novel self-supervised method, referred to as Video Cloze Procedure (VCP), to learn rich spatial-temporal representations. VCP first generates “blanks” by withholding video clips and then creates “options” by applying spatio-te…
View article: Video Cloze Procedure for Self-Supervised Spatio-Temporal Learning
Video Cloze Procedure for Self-Supervised Spatio-Temporal Learning Open
We propose a novel self-supervised method, referred to as Video Cloze Procedure (VCP), to learn rich spatial-temporal representations. VCP first generates "blanks" by withholding video clips and then creates "options" by applying spatio-te…
View article: Curved Text Detection in Natural Scene Images with Semi- and Weakly-Supervised Learning
Curved Text Detection in Natural Scene Images with Semi- and Weakly-Supervised Learning Open
Detecting curved text in the wild is very challenging. Recently, most state-of-the-art methods are segmentation based and require pixel-level annotations. We propose a novel scheme to train an accurate text detector using only a small amou…