Explanipedia

MATRIX: Mask Track Alignment for Interaction-aware Video Generation Open

Siyoon Jin, Seongchan Kim, Daniel Y. F. Chung, Jaeho Lee, H. R. Choi , et al. · 2025

Video DiTs have advanced video generation, yet they still struggle to model multi-instance or subject-object interactions. This raises a key question: How do these models internally represent interactions? To answer this, we curate MATRIX-…

Emergent Temporal Correspondences from Video Diffusion Transformers Open

Soowon Son, Siyoon Jin, Junhwa Hur · 2025

Recent advancements in video diffusion models based on Diffusion Transformers (DiTs) have achieved remarkable success in generating temporally coherent videos. Yet, a fundamental question persists: how do these models internally establish …

Vid-CamEdit: Video Camera Trajectory Editing with Generative Rendering from Estimated Geometry Open

Junyoung Seo, Jisang Han, Jaewoo Jung, Siyoon Jin, Joungbin Lee , et al. · 2025

We introduce Vid-CamEdit, a novel framework for video camera trajectory editing, enabling the re-synthesis of monocular videos along user-defined camera paths. This task is challenging due to its ill-posed nature and the limited multi-view…

MoDiTalker: Motion-Disentangled Diffusion Model for High-Fidelity Talking Head Generation Open

Seyeon Kim, Siyoon Jin, Jihye Park, KiHong Kim, Ji Young Kim , et al. · 2025

Conventional GAN-based models for talking head generation often suffer from limited quality and unstable training. Recent approaches based on diffusion models have attempted to address these limitations and improve fidelity. However, they …

Appearance Matching Adapter for Exemplar-based Semantic Image Synthesis in-the-Wild Open

Siyoon Jin, Jisu Nam, Jiyoung Kim, Daniel Y. F. Chung, Yeong-Seok Kim , et al. · 2024

Computer science Mathematics

Exemplar-based semantic image synthesis generates images aligned with semantic content while preserving the appearance of an exemplar. Conventional structure-guidance models like ControlNet, are limited as they rely solely on text prompts …

MoDiTalker: Motion-Disentangled Diffusion Model for High-Fidelity Talking Head Generation Open

Seyeon Kim, Siyoon Jin, Jihye Park, KiHong Kim, Ji Young Kim , et al. · 2024

Computer science Physics Geology

Conventional GAN-based models for talking head generation often suffer from limited quality and unstable training. Recent approaches based on diffusion models aimed to address these limitations and improve fidelity. However, they still fac…

DreamMatcher: Appearance Matching Self-Attention for Semantically-Consistent Text-to-Image Personalization Open

Jisu Nam, Heesu Kim, DongJae Lee, Siyoon Jin, Seungryong Kim , et al. · 2024

Computer science Psychology Mathematics

The objective of text-to-image (T2I) personalization is to customize a diffusion model to a user-provided reference concept, generating diverse images of the concept aligned with the target prompts. Conventional methods representing the re…

Siyoon Jin YOU? Author Swipe