Difan Liu
YOU?
Author Swipe
View article: DuetSVG: Unified Multimodal SVG Generation with Internal Visual Guidance
DuetSVG: Unified Multimodal SVG Generation with Internal Visual Guidance Open
Recent vision-language model (VLM)-based approaches have achieved impressive results on SVG generation. However, because they generate only text and lack visual signals during decoding, they often struggle with complex semantics and fail t…
View article: Rethinking Layered Graphic Design Generation with a Top-Down Approach
Rethinking Layered Graphic Design Generation with a Top-Down Approach Open
Graphic design is crucial for conveying ideas and messages. Designers usually organize their work into objects, backgrounds, and vectorized text layers to simplify editing. However, this workflow demands considerable expertise. With the ri…
View article: How to Train Your Dragon: Automatic Diffusion‐Based Rigging for Characters with Diverse Topologies
How to Train Your Dragon: Automatic Diffusion‐Based Rigging for Characters with Diverse Topologies Open
Recent diffusion‐based methods have achieved impressive results on animating images of human subjects. However, most of that success has built on human‐specific body pose representations and extensive training with labeled real videos. In …
View article: How to Train Your Dragon: Automatic Diffusion-Based Rigging for Characters with Diverse Topologies
How to Train Your Dragon: Automatic Diffusion-Based Rigging for Characters with Diverse Topologies Open
Recent diffusion-based methods have achieved impressive results on animating images of human subjects. However, most of that success has built on human-specific body pose representations and extensive training with labeled real videos. In …
View article: Mean-Shift Distillation for Diffusion Mode Seeking
Mean-Shift Distillation for Diffusion Mode Seeking Open
We present mean-shift distillation, a novel diffusion distillation technique that provides a provably good proxy for the gradient of the diffusion output distribution. This is derived directly from mean-shift mode seeking on the distributi…
View article: Move-in-2D: 2D-Conditioned Human Motion Generation
Move-in-2D: 2D-Conditioned Human Motion Generation Open
Generating realistic human videos remains a challenging task, with the most effective methods currently relying on a human motion sequence as a control signal. Existing approaches often use existing motion extracted from other videos, whic…
View article: Progressive Autoregressive Video Diffusion Models
Progressive Autoregressive Video Diffusion Models Open
Current frontier video diffusion models have demonstrated remarkable results at generating high-quality videos. However, they can only generate short video clips, normally around 10 seconds or 240 frames, due to computation limitations dur…
View article: HARIVO: Harnessing Text-to-Image Models for Video Generation
HARIVO: Harnessing Text-to-Image Models for Video Generation Open
We present a method to create diffusion-based video models from pretrained Text-to-Image (T2I) models. Recently, AnimateDiff proposed freezing the T2I model while only training temporal layers. We advance this method by proposing a unique …
View article: SNED: Superposition Network Architecture Search for Efficient Video Diffusion Model
SNED: Superposition Network Architecture Search for Efficient Video Diffusion Model Open
While AI-generated content has garnered significant attention, achieving photo-realistic video synthesis remains a formidable challenge. Despite the promising advances in diffusion models for video generation quality, the complex model arc…
View article: NIVeL: Neural Implicit Vector Layers for Text-to-Vector Generation
NIVeL: Neural Implicit Vector Layers for Text-to-Vector Generation Open
The success of denoising diffusion models in representing rich data distributions over 2D raster images has prompted research on extending them to other data representations, such as vector graphics. Unfortunately due to their variable str…
View article: Attention-Driven Training-Free Efficiency Enhancement of Diffusion Models
Attention-Driven Training-Free Efficiency Enhancement of Diffusion Models Open
Diffusion Models (DMs) have exhibited superior performance in generating high-quality and diverse images. However, this exceptional performance comes at the cost of expensive architectural design, particularly due to the attention module h…
View article: VideoGigaGAN: Towards Detail-rich Video Super-Resolution
VideoGigaGAN: Towards Detail-rich Video Super-Resolution Open
Video super-resolution (VSR) approaches have shown impressive temporal consistency in upsampled videos. However, these approaches tend to generate blurrier results than their image counterparts as they are limited in their generative capab…
View article: Customize-A-Video: One-Shot Motion Customization of Text-to-Video Diffusion Models
Customize-A-Video: One-Shot Motion Customization of Text-to-Video Diffusion Models Open
Image customization has been extensively studied in text-to-image (T2I) diffusion models, leading to impressive outcomes and applications. With the emergence of text-to-video (T2V) diffusion models, its temporal counterpart, motion customi…
View article: Analysis of Spatiotemporal Changes in the Gravitational Structure of Urban Agglomerations in Northern and Southern Xinjiang Based on a Gravitational Model
Analysis of Spatiotemporal Changes in the Gravitational Structure of Urban Agglomerations in Northern and Southern Xinjiang Based on a Gravitational Model Open
The urban agglomeration plays a significant role in enhancing integrated regional development. Nevertheless, the expansion of urban agglomerations has demonstrated a lackluster ability to attract cities. Presently, finding solutions to sta…
View article: VecFusion: Vector Font Generation with Diffusion
VecFusion: Vector Font Generation with Diffusion Open
We present VecFusion, a new neural architecture that can generate vector fonts with varying topological structures and precise control point positions. Our approach is a cascaded diffusion model which consists of a raster diffusion model f…
View article: LRM: Large Reconstruction Model for Single Image to 3D
LRM: Large Reconstruction Model for Single Image to 3D Open
We propose the first Large Reconstruction Model (LRM) that predicts the 3D model of an object from a single input image within just 5 seconds. In contrast to many previous methods that are trained on small-scale datasets such as ShapeNet i…
View article: Analysis of the Difference in Changes to Farmers’ Livelihood Capital under Different Land Transfer Modes—A Case Study of Manas County, Xinjiang, China
Analysis of the Difference in Changes to Farmers’ Livelihood Capital under Different Land Transfer Modes—A Case Study of Manas County, Xinjiang, China Open
Farmers’ livelihoods alter as a direct result of land transfer. This study examined the impacts of land transfer on several indicators of farmers’ livelihood capital, as well as variations in the effects of different land transfer methods …
View article: ASSET: Autoregressive Semantic Scene Editing with Transformers at High Resolutions
ASSET: Autoregressive Semantic Scene Editing with Transformers at High Resolutions Open
We present ASSET, a neural architecture for automatically modifying an input high-resolution image according to a user's edits on its semantic segmentation map. Our architecture is based on a transformer with a novel attention mechanism. O…
View article: Neural Strokes: Stylized Line Drawing of 3D Shapes
Neural Strokes: Stylized Line Drawing of 3D Shapes Open
This paper introduces a model for producing stylized line drawings from 3D shapes. The model takes a 3D shape and a viewpoint as input, and outputs a drawing with textured strokes, with variations in stroke thickness, deformation, and colo…
View article: Neural Shape Parsers for Constructive Solid Geometry
Neural Shape Parsers for Constructive Solid Geometry Open
Constructive Solid Geometry (CSG) is a geometric modeling technique that defines complex shapes by recursively applying boolean operations on primitives such as spheres and cylinders. We present CSGNe, a deep network architecture that take…
View article: Neural Contours: Learning to Draw Lines from 3D Shapes
Neural Contours: Learning to Draw Lines from 3D Shapes Open
This paper introduces a method for learning to generate line drawings from 3D models. Our architecture incorporates a differentiable module operating on geometric features of the 3D model, and an image-based module operating on view-based …
View article: ParSeNet: A Parametric Surface Fitting Network for 3D Point Clouds
ParSeNet: A Parametric Surface Fitting Network for 3D Point Clouds Open
View article: CSGNet: Neural Shape Parser for Constructive Solid Geometry
CSGNet: Neural Shape Parser for Constructive Solid Geometry Open
We present a neural architecture that takes as input a 2D or 3D shape and outputs a program that generates the shape. The instructions in our program are based on constructive solid geometry principles, i.e., a set of boolean operations on…