Qize Yang
YOU?
Author Swipe
View article: A Hierarchical Semantic Distillation Framework for Open-Vocabulary Object Detection
A Hierarchical Semantic Distillation Framework for Open-Vocabulary Object Detection Open
Open-vocabulary object detection (OVD) aims to detect objects beyond the training annotations, where detectors are usually aligned to a pre-trained vision-language model, eg, CLIP, to inherit its generalizable recognition ability so that d…
View article: LLMDet: Learning Strong Open-Vocabulary Object Detectors under the Supervision of Large Language Models
LLMDet: Learning Strong Open-Vocabulary Object Detectors under the Supervision of Large Language Models Open
Recent open-vocabulary detectors achieve promising performance with abundant region-level annotated data. In this work, we show that an open-vocabulary detector co-training with a large language model by generating image-level detailed cap…
View article: HumanOmni: A Large Vision-Speech Language Model for Human-Centric Video Understanding
HumanOmni: A Large Vision-Speech Language Model for Human-Centric Video Understanding Open
In human-centric scenes, the ability to simultaneously understand visual and auditory information is crucial. While recent omni models can process multiple modalities, they generally lack effectiveness in human-centric scenes due to the ab…
View article: Frozen-DETR: Enhancing DETR with Image Understanding from Frozen Foundation Models
Frozen-DETR: Enhancing DETR with Image Understanding from Frozen Foundation Models Open
Recent vision foundation models can extract universal representations and show impressive abilities in various tasks. However, their application on object detection is largely overlooked, especially without fine-tuning them. In this work, …
View article: DreamView: Injecting View-specific Text Guidance into Text-to-3D Generation
DreamView: Injecting View-specific Text Guidance into Text-to-3D Generation Open
Text-to-3D generation, which synthesizes 3D assets according to an overall text description, has significantly progressed. However, a challenge arises when the specific appearances need customizing at designated viewpoints but referring so…
View article: PoSynDA: Multi-Hypothesis Pose Synthesis Domain Adaptation for Robust 3D Human Pose Estimation
PoSynDA: Multi-Hypothesis Pose Synthesis Domain Adaptation for Robust 3D Human Pose Estimation Open
Existing 3D human pose estimators face challenges in adapting to new datasets due to the lack of 2D-3D pose pairs in training sets. To overcome this issue, we propose \textit{Multi-Hypothesis \textbf{P}ose \textbf{Syn}thesis \textbf{D}omai…
View article: Interactive Self-Training with Mean Teachers for Semi-supervised Object Detection
Interactive Self-Training with Mean Teachers for Semi-supervised Object Detection Open
The goal of semi-supervised object detection is to learn a detection model using only a few labeled data and large amounts of unlabeled data, thereby reducing the cost of data labeling. Although a few studies have proposed various self-tra…
View article: Caspase-11-Gasdermin D-Mediated Pyroptosis Is Involved in the Pathogenesis of Atherosclerosis
Caspase-11-Gasdermin D-Mediated Pyroptosis Is Involved in the Pathogenesis of Atherosclerosis Open
Background: Pyroptosis is a form of cell death triggered by proinflammatory signals. Recent studies have reported that oxidized phospholipids function as caspase-11 agonists to induce noncanonical inflammasome activation in immune cells. A…
View article: Triglycerides to total cholesterol ratio: an early screening tool for NAFLD in Chinese populations
Triglycerides to total cholesterol ratio: an early screening tool for NAFLD in Chinese populations Open
Background Non-alcoholic fatty liver disease(NAFLD) has a high prevalence in the general population worldwide. Both triglycerides (TG) and total cholesterol (TC) are correlated with the prevalence of NAFLD. The study purpose is to determin…
View article: Rethinking Temporal Fusion for Video-Based Person Re-Identification on Semantic and Time Aspect
Rethinking Temporal Fusion for Video-Based Person Re-Identification on Semantic and Time Aspect Open
Recently, the research interest of person re-identification (ReID) has gradually turned to video-based methods, which acquire a person representation by aggregating frame features of an entire video. However, existing video-based ReID meth…
View article: Person Re-Identification by Contour Sketch Under Moderate Clothing Change
Person Re-Identification by Contour Sketch Under Moderate Clothing Change Open
Person re-identification (re-id), the process of matching pedestrian images across different camera views, is an important task in visual surveillance. Substantial development of re-id has recently been observed, and the majority of existi…
View article: Rethinking Temporal Fusion for Video-based Person Re-identification on Semantic and Time Aspect
Rethinking Temporal Fusion for Video-based Person Re-identification on Semantic and Time Aspect Open
Recently, the research interest of person re-identification (ReID) has gradually turned to video-based methods, which acquire a person representation by aggregating frame features of an entire video. However, existing video-based ReID meth…