Explanipedia

HiDream-I1: A High-Efficient Image Generative Foundation Model with Sparse Diffusion Transformer Open

Jingwen Chen, Yehao Li, Fuchen Long, Yiheng Zhang, Yimeng Wang , et al. · 2025

Recent advancements in image generative foundation models have prioritized quality improvements but often at the cost of increased computational complexity and inference latency. To address this critical trade-off, we introduce HiDream-I1,…

MotionPro: A Precise Motion Controller for Image-to-Video Generation Open

Zhongwei Zhang, Fuchen Long, Zhaofan Qiu, Yingwei Pan, Wu Liu , et al. · 2025

Animating images with interactive motion control has garnered popularity for image-to-video (I2V) generation. Modern approaches typically rely on large Gaussian kernels to extend motion trajectories as condition without explicitly defining…

Ouroboros-Diffusion: Exploring Consistent Content Generation in Tuning-free Long Video Diffusion Open

Jingyuan Chen, Fuchen Long, Jie An, Zhaofan Qiu, Ting Yao , et al. · 2025

The first-in-first-out (FIFO) video diffusion, built on a pre-trained text-to-video model, has recently emerged as an effective approach for tuning-free long video generation. This technique maintains a queue of video frames with progressi…

Ouroboros-Diffusion: Exploring Consistent Content Generation in Tuning-free Long Video Diffusion Open

Jingyuan Chen, Fuchen Long, Jie An, Zhaofan Qiu, Ting Yao , et al. · 2025

Computer science Chemistry Physics

The first-in-first-out (FIFO) video diffusion, built on a pre-trained text-to-video model, has recently emerged as an effective approach for tuning-free long video generation. This technique maintains a queue of video frames with progressi…

Learning Spatial Adaptation and Temporal Coherence in Diffusion Models for Video Super-Resolution Open

Zhikai Chen, Fuchen Long, Zhaofan Qiu, Ting Yao, Wengang Zhou , et al. · 2024

Computer science Mathematics Physics

Diffusion models are just at a tipping point for image super-resolution task. Nevertheless, it is not trivial to capitalize on diffusion models for video super-resolution which necessitates not only the preservation of visual appearance fr…

TRIP: Temporal Residual Learning with Image Noise Prior for Image-to-Video Diffusion Models Open

Zhongwei Zhang, Fuchen Long, Yingwei Pan, Zhaofan Qiu, Ting Yao , et al. · 2024

Computer science Physics

Recent advances in text-to-video generation have demonstrated the utility of powerful diffusion models. Nevertheless, the problem is not trivial when shaping diffusion models to animate static image (i.e., image-to-video generation). The d…

VideoStudio: Generating Consistent-Content and Multi-Scene Videos Open

Fuchen Long, Zhaofan Qiu, Ting Yao, Tao Mei · 2024

Computer science Philosophy Physics

The recent innovations and breakthroughs in diffusion models have significantly expanded the possibilities of generating high-quality videos for the given prompts. Most existing works tackle the single-scene scenario with only one video ev…

Dynamic Temporal Filtering in Video Models Open

Fuchen Long, Zhaofan Qiu, Yingwei Pan, Ting Yao, Chong‐Wah Ngo , et al. · 2022

Computer science Mathematics Philosophy

Video temporal dynamics is conventionally modeled with 3D spatial-temporal kernel or its factorized version comprised of 2D spatial kernel and 1D temporal kernel. The modeling power, nevertheless, is limited by the fixed window size and st…

Bi-Calibration Networks for Weakly-Supervised Video Representation Learning Open

Fuchen Long, Ting Yao, Zhaofan Qiu, Xinmei Tian, Jiebo Luo , et al. · 2022

Computer science Political science Philosophy

The leverage of large volumes of web videos paired with the searched queries or surrounding texts (e.g., title) offers an economic and extensible alternative to supervised video representation learning. Nevertheless, modeling such weakly v…

Stand-Alone Inter-Frame Attention in Video Models Open

Fuchen Long, Zhaofan Qiu, Yingwei Pan, Ting Yao, Jiebo Luo , et al. · 2022

Computer science Physics

Motion, as the uniqueness of a video, has been critical to the development of video understanding models. Modern deep learning models leverage motion by either executing spatio-temporal 3D convolutions, factorizing 3D convolutions into spa…

Silver-Bullet-3D at ManiSkill 2021: Learning-from-Demonstrations and Heuristic Rule-based Methods for Object Manipulation Open

Yingwei Pan, Yehao Li, Yiheng Zhang, Qi Cai, Fuchen Long , et al. · 2022

Computer science Engineering

This paper presents an overview and comparative analysis of our systems designed for the following two tracks in SAPIEN ManiSkill Challenge 2021: No Interaction Track: The No Interaction track targets for learning policies from pre-collect…

Learning to Localize Actions from Moments Open

Fuchen Long, Ting Yao, Zhaofan Qiu, Xinmei Tian, Jiebo Luo , et al. · 2020

Computer science Geography Physics

With the knowledge of action moments (i.e., trimmed video clips that each contains an action instance), humans could routinely localize an action temporally in an untrimmed video. Nevertheless, most practical methods still require all trai…

Gaussian Temporal Awareness Networks for Action Localization Open

Fuchen Long, Ting Yao, Zhaofan Qiu, Xinmei Tian, Jiebo Luo , et al. · 2019

Computer science Mathematics Physics

Temporally localizing actions in a video is a fundamental challenge in video understanding. Most existing approaches have often drawn inspiration from image object detection and extended the advances, e.g., SSD and Faster R-CNN, to produce…

vireoJD-MM at Activity Detection in Extended Videos Open

Fuchen Long, Qi Cai, Zhaofan Qiu, Zhijian Hou, Yingwei Pan , et al. · 2019

Computer science Chemistry Physics

This notebook paper presents an overview and comparative analysis of our system designed for activity detection in extended videos (ActEV-PC) in ActivityNet Challenge 2019. Specifically, we exploit person/vehicle detections in spatial leve…

Fuchen Long YOU? Author Swipe