Motion compensation
View article: MoGAN: Improving Motion Quality in Video Diffusion via Few-Step Motion Adversarial Post-Training
MoGAN: Improving Motion Quality in Video Diffusion via Few-Step Motion Adversarial Post-Training Open
Video diffusion models achieve strong frame-level fidelity but still struggle with motion coherence, dynamics and realism, often producing jitter, ghosting, or implausible dynamics. A key limitation is that the standard denoising MSE objec…
View article: Respiratory Motion Compensation and Haptic Feedback for X-ray-Guided Teleoperated Robotic Needle Insertion
Respiratory Motion Compensation and Haptic Feedback for X-ray-Guided Teleoperated Robotic Needle Insertion Open
Respiratory motion limits the accuracy and precision of abdominal percutaneous procedures. In this paper, respiratory motion is compensated robotically using motion estimation models. Additionally, a teleoperated insertion is performed usi…
View article: Motion Marionette: Rethinking Rigid Motion Transfer via Prior Guidance
Motion Marionette: Rethinking Rigid Motion Transfer via Prior Guidance Open
We present Motion Marionette, a zero-shot framework for rigid motion transfer from monocular source videos to single-view target images. Previous works typically employ geometric, generative, or simulation priors to guide the transfer proc…
View article: ReDirector: Creating Any-Length Video Retakes with Rotary Camera Encoding
ReDirector: Creating Any-Length Video Retakes with Rotary Camera Encoding Open
We present ReDirector, a novel camera-controlled video retake generation method for dynamically captured variable-length videos. In particular, we rectify a common misuse of RoPE in previous works by aligning the spatiotemporal positions o…
View article: MotionV2V: Editing Motion in a Video
MotionV2V: Editing Motion in a Video Open
While generative video models have achieved remarkable fidelity and consistency, applying these capabilities to video editing remains a complex challenge. Recent research has explored motion controllability as a means to enhance text-to-vi…
View article: Augmented Reality Surgical Guidance System with Adaptive Depth-Based Registration Algorithms
Augmented Reality Surgical Guidance System with Adaptive Depth-Based Registration Algorithms Open
Intraoperative patient motion degrades static registration. We propose an adaptive AR guidance system with feedback-driven registration updates. The system employs particle filter-based motion compensation and multi-scale ICP refinement. I…
View article: Motion Marionette: Rethinking Rigid Motion Transfer via Prior Guidance
Motion Marionette: Rethinking Rigid Motion Transfer via Prior Guidance Open
We present Motion Marionette, a zero-shot framework for rigid motion transfer from monocular source videos to single-view target images. Previous works typically employ geometric, generative, or simulation priors to guide the transfer proc…
View article: MotionV2V: Editing Motion in a Video
MotionV2V: Editing Motion in a Video Open
While generative video models have achieved remarkable fidelity and consistency, applying these capabilities to video editing remains a complex challenge. Recent research has explored motion controllability as a means to enhance text-to-vi…
View article: ReDirector: Creating Any-Length Video Retakes with Rotary Camera Encoding
ReDirector: Creating Any-Length Video Retakes with Rotary Camera Encoding Open
We present ReDirector, a novel camera-controlled video retake generation method for dynamically captured variable-length videos. In particular, we rectify a common misuse of RoPE in previous works by aligning the spatiotemporal positions o…
View article: STCDiT: Spatio-Temporally Consistent Diffusion Transformer for High-Quality Video Super-Resolution
STCDiT: Spatio-Temporally Consistent Diffusion Transformer for High-Quality Video Super-Resolution Open
We present STCDiT, a video super-resolution framework built upon a pre-trained video diffusion model, aiming to restore structurally faithful and temporally stable videos from degraded inputs, even under complex camera motions. The main ch…
View article: STCDiT: Spatio-Temporally Consistent Diffusion Transformer for High-Quality Video Super-Resolution
STCDiT: Spatio-Temporally Consistent Diffusion Transformer for High-Quality Video Super-Resolution Open
We present STCDiT, a video super-resolution framework built upon a pre-trained video diffusion model, aiming to restore structurally faithful and temporally stable videos from degraded inputs, even under complex camera motions. The main ch…
View article: Point-to-Point: Sparse Motion Guidance for Controllable Video Editing
Point-to-Point: Sparse Motion Guidance for Controllable Video Editing Open
Accurately preserving motion while editing a subject remains a core challenge in video editing tasks. Existing methods often face a trade-off between edit and motion fidelity, as they rely on motion representations that are either overfitt…
View article: Point-to-Point: Sparse Motion Guidance for Controllable Video Editing
Point-to-Point: Sparse Motion Guidance for Controllable Video Editing Open
Accurately preserving motion while editing a subject remains a core challenge in video editing tasks. Existing methods often face a trade-off between edit and motion fidelity, as they rely on motion representations that are either overfitt…
View article: EgoControl: Controllable Egocentric Video Generation via 3D Full-Body Poses
EgoControl: Controllable Egocentric Video Generation via 3D Full-Body Poses Open
Egocentric video generation with fine-grained control through body motion is a key requirement towards embodied AI agents that can simulate, predict, and plan actions. In this work, we propose EgoControl, a pose-controllable video diffusio…
View article: MotionDuet: Dual-Conditioned 3D Human Motion Generation with Video-Regularized Text Learning
MotionDuet: Dual-Conditioned 3D Human Motion Generation with Video-Regularized Text Learning Open
3D Human motion generation is pivotal across film, animation, gaming, and embodied intelligence. Traditional 3D motion synthesis relies on costly motion capture, while recent work shows that 2D videos provide rich, temporally coherent obse…
View article: EgoControl: Controllable Egocentric Video Generation via 3D Full-Body Poses
EgoControl: Controllable Egocentric Video Generation via 3D Full-Body Poses Open
Egocentric video generation with fine-grained control through body motion is a key requirement towards embodied AI agents that can simulate, predict, and plan actions. In this work, we propose EgoControl, a pose-controllable video diffusio…
View article: MotionDuet: Dual-Conditioned 3D Human Motion Generation with Video-Regularized Text Learning
MotionDuet: Dual-Conditioned 3D Human Motion Generation with Video-Regularized Text Learning Open
3D Human motion generation is pivotal across film, animation, gaming, and embodied intelligence. Traditional 3D motion synthesis relies on costly motion capture, while recent work shows that 2D videos provide rich, temporally coherent obse…
View article: Show Me: Unifying Instructional Image and Video Generation with Diffusion Models
Show Me: Unifying Instructional Image and Video Generation with Diffusion Models Open
Generating visual instructions in a given context is essential for developing interactive world simulators. While prior works address this problem through either text-guided image manipulation or video prediction, these tasks are typically…
View article: Planning with Sketch-Guided Verification for Physics-Aware Video Generation
Planning with Sketch-Guided Verification for Physics-Aware Video Generation Open
Recent video generation approaches increasingly rely on planning intermediate control signals such as object trajectories to improve temporal coherence and motion fidelity. However, these methods mostly employ single-shot plans that are ty…
View article: Flow-Guided Implicit Neural Representation for Motion-Aware Dynamic MRI Reconstruction
Flow-Guided Implicit Neural Representation for Motion-Aware Dynamic MRI Reconstruction Open
Dynamic magnetic resonance imaging (dMRI) captures temporally-resolved anatomy but is often challenged by limited sampling and motion-induced artifacts. Conventional motion-compensated reconstructions typically rely on pre-estimated optica…
View article: Rethinking Diffusion Model-Based Video Super-Resolution: Leveraging Dense Guidance from Aligned Features
Rethinking Diffusion Model-Based Video Super-Resolution: Leveraging Dense Guidance from Aligned Features Open
Diffusion model (DM) based Video Super-Resolution (VSR) approaches achieve impressive perceptual quality. However, they suffer from error accumulation, spatial artifacts, and a trade-off between perceptual quality and fidelity, primarily c…
View article: Planning with Sketch-Guided Verification for Physics-Aware Video Generation
Planning with Sketch-Guided Verification for Physics-Aware Video Generation Open
Recent video generation approaches increasingly rely on planning intermediate control signals such as object trajectories to improve temporal coherence and motion fidelity. However, these methods mostly employ single-shot plans that are ty…
View article: PostCam: Camera-Controllable Novel-View Video Generation with Query-Shared Cross-Attention
PostCam: Camera-Controllable Novel-View Video Generation with Query-Shared Cross-Attention Open
We propose PostCam, a framework for novel-view video generation that enables post-capture editing of camera trajectories in dynamic scenes. We find that existing video recapture methods suffer from suboptimal camera motion injection strate…
View article: Rethinking Diffusion Model-Based Video Super-Resolution: Leveraging Dense Guidance from Aligned Features
Rethinking Diffusion Model-Based Video Super-Resolution: Leveraging Dense Guidance from Aligned Features Open
Diffusion model (DM) based Video Super-Resolution (VSR) approaches achieve impressive perceptual quality. However, they suffer from error accumulation, spatial artifacts, and a trade-off between perceptual quality and fidelity, primarily c…
View article: Flow-Guided Implicit Neural Representation for Motion-Aware Dynamic MRI Reconstruction
Flow-Guided Implicit Neural Representation for Motion-Aware Dynamic MRI Reconstruction Open
Dynamic magnetic resonance imaging (dMRI) captures temporally-resolved anatomy but is often challenged by limited sampling and motion-induced artifacts. Conventional motion-compensated reconstructions typically rely on pre-estimated optica…
View article: PostCam: Camera-Controllable Novel-View Video Generation with Query-Shared Cross-Attention
PostCam: Camera-Controllable Novel-View Video Generation with Query-Shared Cross-Attention Open
We propose PostCam, a framework for novel-view video generation that enables post-capture editing of camera trajectories in dynamic scenes. We find that existing video recapture methods suffer from suboptimal camera motion injection strate…
View article: Show Me: Unifying Instructional Image and Video Generation with Diffusion Models
Show Me: Unifying Instructional Image and Video Generation with Diffusion Models Open
Generating visual instructions in a given context is essential for developing interactive world simulators. While prior works address this problem through either text-guided image manipulation or video prediction, these tasks are typically…
View article: A Respiratory Motion Analysis for Guiding Stereotactic Arrhythmia Radiotherapy Motion Management
A Respiratory Motion Analysis for Guiding Stereotactic Arrhythmia Radiotherapy Motion Management Open
Stereotactic Arrhythmia Radiotherapy (STAR) treats ventricular tachycardia (VT) but requires internal target volume (ITV) expansions to compensate for cardiorespiratory motion. Current clinical r4DCT imaging methods are limited, and the re…
View article: VTinker: Guided Flow Upsampling and Texture Mapping for High-Resolution Video Frame Interpolation
VTinker: Guided Flow Upsampling and Texture Mapping for High-Resolution Video Frame Interpolation Open
Due to large pixel movement and high computational cost, estimating the motion of high-resolution frames is challenging. Thus, most flow-based Video Frame Interpolation (VFI) methods first predict bidirectional flows at low resolution and …
View article: VTinker: Guided Flow Upsampling and Texture Mapping for High-Resolution Video Frame Interpolation
VTinker: Guided Flow Upsampling and Texture Mapping for High-Resolution Video Frame Interpolation Open
Due to large pixel movement and high computational cost, estimating the motion of high-resolution frames is challenging. Thus, most flow-based Video Frame Interpolation (VFI) methods first predict bidirectional flows at low resolution and …