Fuchen Long
YOU?
Author Swipe
View article: HiDream-I1: A High-Efficient Image Generative Foundation Model with Sparse Diffusion Transformer
HiDream-I1: A High-Efficient Image Generative Foundation Model with Sparse Diffusion Transformer Open
Recent advancements in image generative foundation models have prioritized quality improvements but often at the cost of increased computational complexity and inference latency. To address this critical trade-off, we introduce HiDream-I1,…
View article: MotionPro: A Precise Motion Controller for Image-to-Video Generation
MotionPro: A Precise Motion Controller for Image-to-Video Generation Open
Animating images with interactive motion control has garnered popularity for image-to-video (I2V) generation. Modern approaches typically rely on large Gaussian kernels to extend motion trajectories as condition without explicitly defining…
View article: Ouroboros-Diffusion: Exploring Consistent Content Generation in Tuning-free Long Video Diffusion
Ouroboros-Diffusion: Exploring Consistent Content Generation in Tuning-free Long Video Diffusion Open
The first-in-first-out (FIFO) video diffusion, built on a pre-trained text-to-video model, has recently emerged as an effective approach for tuning-free long video generation. This technique maintains a queue of video frames with progressi…
View article: Ouroboros-Diffusion: Exploring Consistent Content Generation in Tuning-free Long Video Diffusion
Ouroboros-Diffusion: Exploring Consistent Content Generation in Tuning-free Long Video Diffusion Open
The first-in-first-out (FIFO) video diffusion, built on a pre-trained text-to-video model, has recently emerged as an effective approach for tuning-free long video generation. This technique maintains a queue of video frames with progressi…
View article: Learning Spatial Adaptation and Temporal Coherence in Diffusion Models for Video Super-Resolution
Learning Spatial Adaptation and Temporal Coherence in Diffusion Models for Video Super-Resolution Open
Diffusion models are just at a tipping point for image super-resolution task. Nevertheless, it is not trivial to capitalize on diffusion models for video super-resolution which necessitates not only the preservation of visual appearance fr…
View article: TRIP: Temporal Residual Learning with Image Noise Prior for Image-to-Video Diffusion Models
TRIP: Temporal Residual Learning with Image Noise Prior for Image-to-Video Diffusion Models Open
Recent advances in text-to-video generation have demonstrated the utility of powerful diffusion models. Nevertheless, the problem is not trivial when shaping diffusion models to animate static image (i.e., image-to-video generation). The d…
View article: VideoStudio: Generating Consistent-Content and Multi-Scene Videos
VideoStudio: Generating Consistent-Content and Multi-Scene Videos Open
The recent innovations and breakthroughs in diffusion models have significantly expanded the possibilities of generating high-quality videos for the given prompts. Most existing works tackle the single-scene scenario with only one video ev…
View article: Dynamic Temporal Filtering in Video Models
Dynamic Temporal Filtering in Video Models Open
Video temporal dynamics is conventionally modeled with 3D spatial-temporal kernel or its factorized version comprised of 2D spatial kernel and 1D temporal kernel. The modeling power, nevertheless, is limited by the fixed window size and st…
View article: Bi-Calibration Networks for Weakly-Supervised Video Representation Learning
Bi-Calibration Networks for Weakly-Supervised Video Representation Learning Open
The leverage of large volumes of web videos paired with the searched queries or surrounding texts (e.g., title) offers an economic and extensible alternative to supervised video representation learning. Nevertheless, modeling such weakly v…
View article: Stand-Alone Inter-Frame Attention in Video Models
Stand-Alone Inter-Frame Attention in Video Models Open
Motion, as the uniqueness of a video, has been critical to the development of video understanding models. Modern deep learning models leverage motion by either executing spatio-temporal 3D convolutions, factorizing 3D convolutions into spa…
View article: Silver-Bullet-3D at ManiSkill 2021: Learning-from-Demonstrations and Heuristic Rule-based Methods for Object Manipulation
Silver-Bullet-3D at ManiSkill 2021: Learning-from-Demonstrations and Heuristic Rule-based Methods for Object Manipulation Open
This paper presents an overview and comparative analysis of our systems designed for the following two tracks in SAPIEN ManiSkill Challenge 2021: No Interaction Track: The No Interaction track targets for learning policies from pre-collect…
View article: Learning to Localize Actions from Moments
Learning to Localize Actions from Moments Open
With the knowledge of action moments (i.e., trimmed video clips that each contains an action instance), humans could routinely localize an action temporally in an untrimmed video. Nevertheless, most practical methods still require all trai…
View article: Gaussian Temporal Awareness Networks for Action Localization
Gaussian Temporal Awareness Networks for Action Localization Open
Temporally localizing actions in a video is a fundamental challenge in video understanding. Most existing approaches have often drawn inspiration from image object detection and extended the advances, e.g., SSD and Faster R-CNN, to produce…
View article: vireoJD-MM at Activity Detection in Extended Videos
vireoJD-MM at Activity Detection in Extended Videos Open
This notebook paper presents an overview and comparative analysis of our system designed for activity detection in extended videos (ActEV-PC) in ActivityNet Challenge 2019. Specifically, we exploit person/vehicle detections in spatial leve…