Siyoon Jin
YOU?
Author Swipe
View article: MATRIX: Mask Track Alignment for Interaction-aware Video Generation
MATRIX: Mask Track Alignment for Interaction-aware Video Generation Open
Video DiTs have advanced video generation, yet they still struggle to model multi-instance or subject-object interactions. This raises a key question: How do these models internally represent interactions? To answer this, we curate MATRIX-…
View article: Emergent Temporal Correspondences from Video Diffusion Transformers
Emergent Temporal Correspondences from Video Diffusion Transformers Open
Recent advancements in video diffusion models based on Diffusion Transformers (DiTs) have achieved remarkable success in generating temporally coherent videos. Yet, a fundamental question persists: how do these models internally establish …
View article: Vid-CamEdit: Video Camera Trajectory Editing with Generative Rendering from Estimated Geometry
Vid-CamEdit: Video Camera Trajectory Editing with Generative Rendering from Estimated Geometry Open
We introduce Vid-CamEdit, a novel framework for video camera trajectory editing, enabling the re-synthesis of monocular videos along user-defined camera paths. This task is challenging due to its ill-posed nature and the limited multi-view…
View article: MoDiTalker: Motion-Disentangled Diffusion Model for High-Fidelity Talking Head Generation
MoDiTalker: Motion-Disentangled Diffusion Model for High-Fidelity Talking Head Generation Open
Conventional GAN-based models for talking head generation often suffer from limited quality and unstable training. Recent approaches based on diffusion models have attempted to address these limitations and improve fidelity. However, they …
View article: Appearance Matching Adapter for Exemplar-based Semantic Image Synthesis in-the-Wild
Appearance Matching Adapter for Exemplar-based Semantic Image Synthesis in-the-Wild Open
Exemplar-based semantic image synthesis generates images aligned with semantic content while preserving the appearance of an exemplar. Conventional structure-guidance models like ControlNet, are limited as they rely solely on text prompts …
View article: MoDiTalker: Motion-Disentangled Diffusion Model for High-Fidelity Talking Head Generation
MoDiTalker: Motion-Disentangled Diffusion Model for High-Fidelity Talking Head Generation Open
Conventional GAN-based models for talking head generation often suffer from limited quality and unstable training. Recent approaches based on diffusion models aimed to address these limitations and improve fidelity. However, they still fac…
View article: DreamMatcher: Appearance Matching Self-Attention for Semantically-Consistent Text-to-Image Personalization
DreamMatcher: Appearance Matching Self-Attention for Semantically-Consistent Text-to-Image Personalization Open
The objective of text-to-image (T2I) personalization is to customize a diffusion model to a user-provided reference concept, generating diverse images of the concept aligned with the target prompts. Conventional methods representing the re…