Qi Zhao
YOU?
Author Swipe
View article: R-Search: Empowering LLM Reasoning with Search via Multi-Reward Reinforcement Learning
R-Search: Empowering LLM Reasoning with Search via Multi-Reward Reinforcement Learning Open
Large language models (LLMs) have notably progressed in multi-step and long-chain reasoning. However, extending their reasoning capabilities to encompass deep interactions with search remains a non-trivial challenge, as models often fail t…
View article: CameraCtrl II: Dynamic Scene Exploration via Camera-controlled Video Diffusion Models
CameraCtrl II: Dynamic Scene Exploration via Camera-controlled Video Diffusion Models Open
This paper introduces CameraCtrl II, a framework that enables large-scale dynamic scene exploration through a camera-controlled video diffusion model. Previous camera-conditioned video generative models suffer from diminished video dynamic…
View article: DeepNote: Note-Centric Deep Retrieval-Augmented Generation
DeepNote: Note-Centric Deep Retrieval-Augmented Generation Open
View article: Decoding Gestures in Electromyography: Spatiotemporal Graph Neural Networks for Generalizable and Interpretable Classification
Decoding Gestures in Electromyography: Spatiotemporal Graph Neural Networks for Generalizable and Interpretable Classification Open
In recent years, significant strides in deep learning have propelled the advancement of electromyography (EMG)-based upper-limb gesture recognition systems, yielding notable successes across a spectrum of domains, including rehabilitation,…
View article: Is Your Text-to-Image Model Robust to Caption Noise?
Is Your Text-to-Image Model Robust to Caption Noise? Open
In text-to-image (T2I) generation, a prevalent training technique involves utilizing Vision Language Models (VLMs) for image re-captioning. Even though VLMs are known to exhibit hallucination, generating descriptive content that deviates f…
View article: PrefRAG: Preference-Driven Multi-Source Retrieval Augmented Generation
PrefRAG: Preference-Driven Multi-Source Retrieval Augmented Generation Open
Retrieval-Augmented Generation (RAG) has emerged as a reliable external knowledge augmentation technique to mitigate hallucination issues and parameterized knowledge limitations in Large Language Models (LLMs). Existing adaptive RAG (ARAG)…
View article: LongRAG: A Dual-Perspective Retrieval-Augmented Generation Paradigm for Long-Context Question Answering
LongRAG: A Dual-Perspective Retrieval-Augmented Generation Paradigm for Long-Context Question Answering Open
Long-Context Question Answering (LCQA), a challenging task, aims to reason over long-context documents to yield accurate answers to questions. Existing long-context Large Language Models (LLMs) for LCQA often struggle with the "lost in the…
View article: DeepNote: Note-Centric Deep Retrieval-Augmented Generation
DeepNote: Note-Centric Deep Retrieval-Augmented Generation Open
Retrieval-Augmented Generation (RAG) mitigates factual errors and hallucinations in Large Language Models (LLMs) for question-answering (QA) by incorporating external knowledge. However, existing adaptive RAG methods rely on LLMs to predic…
View article: Unveiling EMG semantics: a prototype-learning approach to generalizable gesture classification
Unveiling EMG semantics: a prototype-learning approach to generalizable gesture classification Open
Objective. Upper limb loss can profoundly impact an individual’s quality of life, posing challenges to both physical capabilities and emotional well-being. To restore limb function by decoding electromyography (EMG) signals, in this paper,…
View article: Beyond Average: Individualized Visual Scanpath Prediction
Beyond Average: Individualized Visual Scanpath Prediction Open
Understanding how attention varies across individuals has significant scientific and societal impacts. However, existing visual scanpath models treat attention uniformly, neglecting individual differences. To bridge this gap, this paper fo…
View article: MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation
MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation Open
We present MicroCinema, a straightforward yet effective framework for high-quality and coherent text-to-video generation. Unlike existing approaches that align text prompts with video directly, MicroCinema introduces a Divide-and-Conquer s…
View article: What Do Deep Saliency Models Learn about Visual Attention?
What Do Deep Saliency Models Learn about Visual Attention? Open
In recent years, deep saliency models have made significant progress in predicting human visual attention. However, the mechanisms behind their success remain largely unexplained due to the opaque nature of deep neural networks. In this pa…
View article: Evaluating longitudinal relationships between parental monitoring and substance use in a multi-year, intensive longitudinal study of 670 adolescent twins
Evaluating longitudinal relationships between parental monitoring and substance use in a multi-year, intensive longitudinal study of 670 adolescent twins Open
Introduction Parental monitoring is a key intervention target for adolescent substance use, however this practice is largely supported by causally uninformative cross-sectional or sparse-longitudinal observational research designs. Methods…
View article: DGC-CRL: Dependency Graph Convolution based Contrastive Representation Learning for Chinese Medical Question Matching
DGC-CRL: Dependency Graph Convolution based Contrastive Representation Learning for Chinese Medical Question Matching Open
As one kind of domain-specific question answering (QA) systems, the medical QA systems require much more stability, fast system speed and response accuracy. Therefore, the retrieval based QA systems are more suitable, among which the deep …
View article: Computational and Mathematical Methods in Medicine Prediction of COVID-19 in BRICS Countries: An Integrated Deep Learning Model of CEEMDAN-R-ILSTM-Elman
Computational and Mathematical Methods in Medicine Prediction of COVID-19 in BRICS Countries: An Integrated Deep Learning Model of CEEMDAN-R-ILSTM-Elman Open
Since the outbreak of COVID-19, BRICS countries have experienced different epidemic spread due to different health conditions, social isolation measures, vaccination rates, and other factors. A descriptive analysis is conducted for the spr…
View article: A portable, self-contained neuroprosthetic hand with deep learning-based finger control
A portable, self-contained neuroprosthetic hand with deep learning-based finger control Open
Objective. Deep learning-based neural decoders have emerged as the prominent approach to enable dexterous and intuitive control of neuroprosthetic hands. Yet few studies have materialized the use of deep learning in clinical settings due t…
View article: Leveraging Human Attention in Novel Object Captioning
Leveraging Human Attention in Novel Object Captioning Open
Image captioning models depend on training with paired image-text corpora, which poses various challenges in describing images containing novel objects absent from the training data. While previous novel object captioning methods rely on e…
View article: Deep Learning-Based Approaches for Decoding Motor Intent From Peripheral Nerve Signals
Deep Learning-Based Approaches for Decoding Motor Intent From Peripheral Nerve Signals Open
Previous literature shows that deep learning is an effective tool to decode the motor intent from neural signals obtained from different parts of the nervous system. However, deep neural networks are often computationally complex and not f…
View article: A Portable, Self-Contained Neuroprosthetic Hand with Deep Learning-Based Finger Control
A Portable, Self-Contained Neuroprosthetic Hand with Deep Learning-Based Finger Control Open
Objective: Deep learning-based neural decoders have emerged as the prominent approach to enable dexterous and intuitive control of neuroprosthetic hands. Yet few studies have materialized the use of deep learning in clinical settings due t…
View article: Deep Learning-Based Approaches for Decoding Motor Intent from Peripheral Nerve Signals
Deep Learning-Based Approaches for Decoding Motor Intent from Peripheral Nerve Signals Open
The ultimate goal of an upper-limb neuroprosthesis is to achieve dexterous and intuitive control of individual fingers. Previous literature shows that deep learning (DL) is an effective tool to decode the motor intent from neural signals o…
View article: A bioelectric neural interface towards intuitive prosthetic control for amputees
A bioelectric neural interface towards intuitive prosthetic control for amputees Open
Objective While prosthetic hands with independently actuated digits have become commercially available, state-of-the-art human-machine interfaces (HMI) only permit control over a limited set of grasp patterns, which does not enable amputee…
View article: AiR: Attention with Reasoning Capability
AiR: Attention with Reasoning Capability Open
While attention has been an increasingly popular component in deep neural networks to both interpret and boost performance of models, little work has examined how attention progresses to accomplish a task and whether it is reasonable. In t…
View article: Saliency Prediction with External Knowledge
Saliency Prediction with External Knowledge Open
The last decades have seen great progress in saliency prediction, with the success of deep neural networks that are able to encode high-level semantics. Yet, while humans have the innate capability in leveraging their knowledge to decide w…
View article: $n$-Reference Transfer Learning for Saliency Prediction
$n$-Reference Transfer Learning for Saliency Prediction Open
Benefiting from deep learning research and large-scale datasets, saliency prediction has achieved significant success in the past decade. However, it still remains challenging to predict saliency maps on images in new domains that lack suf…
View article: GradMix: Multi-source Transfer across Domains and Tasks
GradMix: Multi-source Transfer across Domains and Tasks Open
The computer vision community is witnessing an unprecedented rate of new tasks being proposed and addressed, thanks to the deep convolutional networks' capability to find complex mappings from X to Y. The advent of each task often accompan…
View article: Direction Concentration Learning: Enhancing Congruency in Machine Learning
Direction Concentration Learning: Enhancing Congruency in Machine Learning Open
One of the well-known challenges in computer vision tasks is the visual diversity of images, which could result in an agreement or disagreement between the learned knowledge and the visual content exhibited by the current observation. In t…
View article: Human Annotations Improve GAN Performances
Human Annotations Improve GAN Performances Open
Generative Adversarial Networks (GANs) have shown great success in many applications. In this work, we present a novel method that leverages human annotations to improve the quality of generated images. Unlike previous paradigms that direc…
View article: Human motor decoding from neural signals: a review
Human motor decoding from neural signals: a review Open
View article: Video Storytelling: Textual Summaries for Events
Video Storytelling: Textual Summaries for Events Open
Bridging vision and natural language is a longstanding goal in computer vision and multimedia research. While earlier works focus on generating a single-sentence description for visual content, recent works have studied paragraph generatio…
View article: Learning to Learn From Noisy Labeled Data
Learning to Learn From Noisy Labeled Data Open
Despite the success of deep neural networks (DNNs) in image classification tasks, the human-level performance relies on massive training data with high-quality manual annotations, which are expensive and time-consuming to collect. There ex…