Explanipedia

Mitigating the Noise Shift for Denoising Generative Models via Noise Awareness Guidance Open

Jincheng Zhong, Boyuan Jiang, Tao Xin, Pengfei Wan, Kun Gai , et al. · 2025

Existing denoising generative models rely on solving discretized reverse-time SDEs or ODEs. In this paper, we identify a long-overlooked yet pervasive issue in this family of models: a misalignment between the pre-defined noise level and t…

VFRTok: Variable Frame Rates Video Tokenizer with Duration-Proportional Information Assumption Open

Tianxiong Zhong, Xingye Tian, Boyuan Jiang, Xuebo Wang, Tao Xin , et al. · 2025

Modern video generation frameworks based on Latent Diffusion Models suffer from inefficiencies in tokenization due to the Frame-Proportional Information Assumption. Existing tokenizers provide fixed temporal compression rates, causing the …

CrossVTON: Mimicking the Logic Reasoning on Cross-category Virtual Try-on guided by Tri-zone Priors Open

Donghao Luo, Yujie Liang, Peng Xu, Xiaobin Hu, Boyuan Jiang , et al. · 2025

Computer science

Despite remarkable progress in image-based virtual try-on systems, generating realistic and robust fitting images for cross-category virtual try-on remains a challenging task. The primary difficulty arises from the absence of human-like re…

DynamicControl: Adaptive Condition Selection for Improved Text-to-Image Generation Open

Qingdong He, Jinlong Peng, Pengcheng Xu, Boyuan Jiang, Xiaobin Hu , et al. · 2024

Computer science

To enhance the controllability of text-to-image diffusion models, current ControlNet-like models have explored various control signals to dictate image attributes. However, existing methods either handle conditions inefficiently or use a f…

Unveil Inversion and Invariance in Flow Transformer for Versatile Image Editing Open

Pengcheng Xu, Boyuan Jiang, Xiaobin Hu, Donghao Luo, Qingdong He , et al. · 2024

Computer science Geology Engineering

Leveraging the large generative prior of the flow transformer for tuning-free image editing requires authentic inversion to project the image into the model's domain and a flexible invariance control mechanism to preserve non-target conten…

VIVID-10M: A Dataset and Baseline for Versatile and Interactive Video Local Editing Open

Jiahao Hu, Tianxiong Zhong, Xuebo Wang, Boyuan Jiang, Xingye Tian , et al. · 2024

Computer science Political science

Diffusion-based image editing models have made remarkable progress in recent years. However, achieving high-quality video editing remains a significant challenge. One major hurdle is the absence of open-source, large-scale video editing da…

FitDiT: Advancing the Authentic Garment Details for High-fidelity Virtual Try-on Open

Boyuan Jiang, Xiaobin Hu, Donghao Luo, Qingdong He, Chengming Xu , et al. · 2024

Computer science Engineering

Although image-based virtual try-on has made considerable progress, emerging approaches still encounter challenges in producing high-fidelity and robust fitting images across diverse scenarios. These methods often struggle with issues such…

Koala-36M: A Large-scale Video Dataset Improving Consistency between Fine-grained Conditions and Video Content Open

Qiuheng Wang, Yukai Shi, Jun‐Yu Ou, Rui Chen, Lin Ke , et al. · 2024

Computer science Geography Mathematics

With the continuous progress of visual generation technologies, the scale of video datasets has grown exponentially. The quality of these datasets plays a pivotal role in the performance of video generation models. We assert that temporal …

VTON-HandFit: Virtual Try-on for Arbitrary Hand Pose Guided by Hand Priors Embedding Open

Yujie Liang, Xiaobin Hu, Boyuan Jiang, Donghao Luo, Kai Wu , et al. · 2024

Computer science Mathematics

Although diffusion-based image virtual try-on has made considerable progress, emerging approaches still struggle to effectively address the issue of hand occlusion (i.e., clothing regions occluded by the hand part), leading to a notable de…

Oracle Bone Inscriptions Multi-modal Dataset Open

Bang Li, Donghao Luo, Yujie Liang, Jing Yang, Zengmao Ding , et al. · 2024

Computer science Chemistry

Oracle bone inscriptions(OBI) is the earliest developed writing system in China, bearing invaluable written exemplifications of early Shang history and paleography. However, the task of deciphering OBI, in the current climate of the schola…

NoiseBoost: Alleviating Hallucination with Noise Perturbation for Multimodal Large Language Models Open

Kaichun Wu, Boyuan Jiang, Zhengkai Jiang, Qingdong He, Donghao Luo , et al. · 2024

Computer science Psychology Physics

Multimodal large language models (MLLMs) contribute a powerful mechanism to understanding visual information building on large language models. However, MLLMs are notorious for suffering from hallucinations, especially when generating leng…

A Multimodal, Multi-Task Adapting Framework for Video Action Recognition Open

Mengmeng Wang, Jiazheng Xing, Boyuan Jiang, Jun Chen, Jianbiao Mei , et al. · 2024

Computer science Psychology Engineering

Recently, the rise of large-scale vision-language pretrained models like CLIP, coupled with the technology of Parameter-Efficient FineTuning (PEFT), has captured substantial attraction in video action recognition. Nevertheless, prevailing …

M2-CLIP: A Multimodal, Multi-task Adapting Framework for Video Action Recognition Open

Mengmeng Wang, Jiazheng Xing, Boyuan Jiang, Jun Chen, Jianbiao Mei , et al. · 2024

Computer science Mathematics Economics

Recently, the rise of large-scale vision-language pretrained models like CLIP, coupled with the technology of Parameter-Efficient FineTuning (PEFT), has captured substantial attraction in video action recognition. Nevertheless, prevailing …

PortraitBooth: A Versatile Portrait Model for Fast Identity-preserved Personalization Open

Peng Xu, Junwei Zhu, Boyuan Jiang, Ying Tai, Donghao Luo , et al. · 2023

Computer science Physics

Recent advancements in personalized image generation using diffusion models have been noteworthy. However, existing methods suffer from inefficiencies due to the requirement for subject-specific fine-tuning. This computationally intensive …

Probabilistic Triangulation for Uncalibrated Multi-View 3D Human Pose Estimation Open

Boyuan Jiang, Lei Hu, Shihong Xia · 2023

Computer science Mathematics

3D human pose estimation has been a long-standing challenge in computer vision and graphics, where multi-view methods have significantly progressed but are limited by the tedious calibration processes. Existing multi-view methods are restr…

Dynamic Frame Interpolation in Wavelet Domain Open

Lingtong Kong, Boyuan Jiang, Donghao Luo, Wenqing Chu, Ying Tai , et al. · 2023

Computer science

Video frame interpolation is an important low-level vision task, which can increase frame rate for more fluent visual experience. Existing methods have achieved great success by employing advanced motion models and synthesis networks. Howe…

Pose-Aware Attention Network for Flexible Motion Retargeting by Body Part Open

Lei Hu, Zihao Zhang, Chongyang Zhong, Boyuan Jiang, Shihong Xia · 2023

Computer science

Motion retargeting is a fundamental problem in computer graphics and computer vision. Existing approaches usually have many strict requirements, such as the source-target skeletons needing to have the same number of joints or share the sam…

IFRNet: Intermediate Feature Refine Network for Efficient Frame Interpolation Open

Lingtong Kong, Boyuan Jiang, Donghao Luo, Wenqing Chu, Xiaoming Huang , et al. · 2022

Computer science Physics Philosophy

Prevailing video frame interpolation algorithms, that generate the intermediate frames from consecutive inputs, typically rely on complex model architectures with heavy parameters or large delay, hindering them from diverse real-time appli…

Quantitative susceptibility mapping to evaluate brain iron deposition and its correlation with physiological parameters in hypertensive patients Open

Xin Li, Dayong Jin, Yinhu Zhu, Liyao Liu, Yanqiang Qiao , et al. · 2021

Medicine Chemistry Psychology

These results are indicative of the role of overload brain iron in deep brain gray matter nuclei in HP and suggest that HP is associated with excess brain iron in certain deep gray matter regions.

Learning Comprehensive Motion Representation for Action Recognition Open

Mingyu Wu, Boyuan Jiang, Donghao Luo, Junchi Yan, Yabiao Wang , et al. · 2021

Computer science Philosophy Political science

For action recognition learning, 2D CNN-based methods are efficient but may yield redundant features due to applying the same 2D convolution kernel to each frame. Recent efforts attempt to capture motion information by establishing inter-f…

Learning Comprehensive Motion Representation for Action Recognition Open

Mingyu Wu, Boyuan Jiang, Donghao Luo, Junchi Yan, Yabiao Wang , et al. · 2021

Computer science Political science

For action recognition learning, 2D CNN-based methods are efficient but may yield redundant features due to applying the same 2D convolution kernel to each frame. Recent efforts attempt to capture motion information by establishing inter-f…

Multi-Level Adaptive Region of Interest and Graph Learning for Facial Action Unit Recognition Open

Jingwei Yan, Boyuan Jiang, Jingjing Wang, Qiang Li, Chunmao Wang , et al. · 2021

Computer science Philosophy Political science

In facial action unit (AU) recognition tasks, regional feature learning and AU relation modeling are two effective aspects which are worth exploring. However, the limited representation capacity of regional features makes it difficult for …

Imputation of Missing Traffic Flow Data Using Denoising Autoencoders Open

Boyuan Jiang, Muhammad Danial Siddiqi, Reza Asadi, Amelia Regan · 2021

Computer science

In transportation engineering, spatio-temporal data including traffic flow, speed, and occupancy are collected from different kinds of sensors and used by transportation engineers for analysis. However, the missing data influence the analy…

Hyperparameter Tuning to Optimize Implementations of Denoising Autoencoders for Imputation of Missing Spatio-temporal Data Open

Muhammad Danial Siddiqi, Boyuan Jiang, Reza Asadi, Amelia Regan · 2021

Computer science

Spatio-temporal data collected from sensors can sometimes have gaps where data is missing. Transportation planners and engineers use such data to perform various different types of analyses, but the gaps in the data make it difficult to ma…

STM: SpatioTemporal and Motion Encoding for Action Recognition Open

Boyuan Jiang, Mengmeng Wang, Weihao Gan, Wei Wu, Junjie Yan · 2019

Computer science Mathematics Chemistry

Spatiotemporal and motion features are two complementary and crucial information for video action recognition. Recent state-of-the-art methods adopt a 3D CNN stream to learn spatiotemporal features and another flow stream to learn motion f…

Joint Domain Alignment and Discriminative Feature Learning for Unsupervised Deep Domain Adaptation Open

Chao Chen, Zhihong Chen, Boyuan Jiang, Xinyu Jin · 2019

Computer science Mathematics Philosophy

Recently, considerable effort has been devoted to deep domain adaptation in computer vision and machine learning communities. However, most of existing work only concentrates on learning shared feature representation by minimizing the dist…

Selective Transfer with Reinforced Transfer Network for Partial Domain Adaptation Open

Zhihong Chen, Chao Chen, Zhaowei Cheng, Boyuan Jiang, Ke Fang , et al. · 2019

Computer science Mathematics Philosophy

One crucial aspect of partial domain adaptation (PDA) is how to select the relevant source samples in the shared classes for knowledge transfer. Previous PDA methods tackle this problem by re-weighting the source samples based on their hig…

Joint Domain Alignment and Discriminative Feature Learning for Unsupervised Deep Domain Adaptation Open

Chao Chen, Zhihong Chen, Boyuan Jiang, Xinyu Jin · 2018

Computer science Mathematics Philosophy

Recently, considerable effort has been devoted to deep domain adaptation in computer vision and machine learning communities. However, most of existing work only concentrates on learning shared feature representation by minimizing the dist…

Boyuan Jiang YOU? Author Swipe