Explanipedia

Dense Semantic Matching with VGGT Prior Open

Songlin Yang, Tong Wei, Yushi Lan, Zeqi Xiao, Anyi Rao , et al. · 2025

Semantic matching aims to establish pixel-level correspondences between instances of the same category and represents a fundamental task in computer vision. Existing approaches suffer from two limitations: (i) Geometric Ambiguity: Their re…

AI for Creative Visual Content Generation, Editing and Understanding Open

Or Patashnik, Gaurav Parmar, Anyi Rao, Ozgur Kara, Fabian Caba Heilbron , et al. · 2025

Generative AI for Film Creation: A Survey of Recent Advances Open

Ruihan Zhang, Borou Yu, Jiajian Min, Yetong Xin, Zheng Wei , et al. · 2025

Generative AI (GenAI) is transforming filmmaking, equipping artists with tools like text-to-image and image-to-video diffusion, neural radiance fields, avatar generation, and 3D synthesis. This paper examines the adoption of these technolo…

Light-A-Video: Training-free Video Relighting via Progressive Light Fusion Open

Yujie Zhou, Jiazi Bu, Pengyang Ling, Pan Zhang, Tong Wu , et al. · 2025

Recent advancements in image relighting models, driven by large-scale datasets and pre-trained diffusion models, have enabled the imposition of consistent lighting. However, video relighting still lags, primarily due to the excessive train…

Mindalogue: LLM-Powered Nonlinear Interaction for Effective Learning and Task Exploration Open

Rui Zhang, Ziyao Zhang, Frank Zhu, Jiajie Zhou, Anyi Rao · 2024

Current generative AI models like ChatGPT, Claude, and Gemini are widely used for knowledge dissemination, task decomposition, and creative thinking. However, their linear interaction methods often force users to repeatedly compare and cop…

ScriptViz: A Visualization Tool to Aid Scriptwriting based on a Large Movie Database Open

Anyi Rao, Jean-Peïc Chou, Maneesh Agrawala · 2024

Scriptwriters usually rely on their mental visualization to create a vivid story by using their imagination to see, feel, and experience the scenes they are writing. Besides mental visualization, they often refer to existing images or scen…

CinePreGen: Camera Controllable Video Previsualization via Engine-powered Diffusion Open

Yiran Chen, Anyi Rao, Xuekun Jiang, Shishi Xiao, Ruiqing Ma , et al. · 2024

With advancements in video generative AI models (e.g., SORA), creators are increasingly using these techniques to enhance video previsualization. However, they face challenges with incomplete and mismatched AI workflows. Existing methods m…

Cinematic Behavior Transfer via NeRF-based Differentiable Filming Open

Xuekun Jiang, Anyi Rao, Jingbo Wang, Dahua Lin, Bo Dai · 2023

In the evolving landscape of digital media and video production, the precise manipulation and reproduction of visual elements like camera movements and character actions are highly desired. Existing SLAM methods face limitations in dynamic…

SparseCtrl: Adding Sparse Controls to Text-to-Video Diffusion Models Open

Yuwei Guo, Ceyuan Yang, Anyi Rao, Maneesh Agrawala, Dahua Lin , et al. · 2023

The development of text-to-video (T2V), i.e., generating videos with a given text prompt, has been significantly advanced in recent years. However, relying solely on text prompts often results in ambiguous frame composition due to spatial …

Automated Conversion of Music Videos into Lyric Videos Open

Jiaju Ma, Anyi Rao, Li‐Yi Wei, Rubaiat Habib Kazi, Hijung Valentina Shin , et al. · 2023

Musicians and fans often produce lyric videos, a form of music videos that\nshowcase the song's lyrics, for their favorite songs. However, making such\nvideos can be challenging and time-consuming as the lyrics need to be added in\nsynchro…

Zero-shot Skeleton-based Action Recognition via Mutual Information Estimation and Maximization Open

Yujie Zhou, Wenwen Qiang, Anyi Rao, Ning Lin, Bing Su , et al. · 2023

Zero-shot skeleton-based action recognition aims to recognize actions of unseen categories after training on data of seen categories. The key is to build the connection between visual and semantic space from seen to unseen classes. Previou…

HireVAE: An Online and Adaptive Factor Model Based on Hierarchical and Regime-Switch VAE Open

Zikai Wei, Anyi Rao, Bo Dai, Dahua Lin · 2023

Factor model is a fundamental investment tool in quantitative investment, which can be empowered by deep learning to become more flexible and efficient in practical complicated investing situations. However, it is still an open question to…

AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning Open

Yuwei Guo, Ceyuan Yang, Anyi Rao, Yaohui Wang, Yu Qiao , et al. · 2023

With the advance of text-to-image (T2I) diffusion models (e.g., Stable Diffusion) and corresponding personalization techniques such as DreamBooth and LoRA, everyone can manifest their imagination into high-quality images at an affordable c…

Self-Supervised Action Representation Learning from Partial Spatio-Temporal Skeleton Sequences Open

Zhou Yujie, Haodong Duan, Anyi Rao, Bing Su, Jiaqi Wang · 2023

Self-supervised learning has demonstrated remarkable capability in representation learning for skeleton-based action recognition. Existing methods mainly focus on applying global data augmentation to generate different views of the skeleto…

HireVAE: An Online and Adaptive Factor Model Based on Hierarchical and Regime-Switch VAE Open

Zikai Wei, Anyi Rao, Bo Dai, Dahua Lin · 2023

Factor model is a fundamental investment tool in quantitative investment, which can be empowered by deep learning to become more flexible and efficient in practical complicated investing situations. However, it is still an open question to…

CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language Transformers Open

Dachuan Shi, Chaofan Tao, Anyi Rao, Zhendong Yang, Chun Yuan , et al. · 2023

Recent vision-language models have achieved tremendous advances. However, their computational costs are also escalating dramatically, making model acceleration exceedingly critical. To pursue more efficient vision-language Transformers, th…

Self-supervised Action Representation Learning from Partial Spatio-Temporal Skeleton Sequences Open

Zhou Yujie, Haodong Duan, Anyi Rao, Bing Su, Jiaqi Wang · 2023

Self-supervised learning has demonstrated remarkable capability in representation learning for skeleton-based action recognition. Existing methods mainly focus on applying global data augmentation to generate different views of the skeleto…

Dynamic Storyboard Generation in an Engine-based Virtual Environment for Video Production Open

Anyi Rao, Xuekun Jiang, Yuwei Guo, Linning Xu, Lei Yang , et al. · 2023

Amateurs working on mini-films and short-form videos usually spend lots of time and effort on the multi-round complicated process of setting and adjusting scenes, plots, and cameras to deliver satisfying video shots. We present Virtual Dyn…

Temporal and Contextual Transformer for Multi-Camera Editing of TV Shows Open

Anyi Rao, Xuekun Jiang, Sichen Wang, Yuwei Guo, Zihao Liu , et al. · 2022

The ability to choose an appropriate camera view among multiple cameras plays a vital role in TV shows delivery. But it is hard to figure out the statistical pattern and apply intelligent processing due to the lack of high-quality training…

A Molecular Multimodal Foundation Model Associating Molecule Graphs with Natural Language Open

Bing Su, Dazhao Du, Yang Zhao, Zhou Yujie, Jiangmeng Li , et al. · 2022

Although artificial intelligence (AI) has made significant progress in understanding molecules in a wide range of fields, existing models generally acquire the single cognitive ability from the single molecular modality. Since the hierarch…

AutoGPart: Intermediate Supervision Search for Generalizable 3D Part Segmentation Open

Xueyi Liu, Xiaomeng Xu, Anyi Rao, Chuang Gan, Yi Li · 2022

Training a generalizable 3D part segmentation network is quite challenging but of great importance in real-world applications. To tackle this problem, some works design task-specific solutions by translating human understanding of the task…

BungeeNeRF: Progressive Neural Radiance Field for Extreme Multi-scale Scene Rendering Open

Yuanbo Xiangli, Linning Xu, Xingang Pan, Nanxuan Zhao, Anyi Rao , et al. · 2021

Neural radiance fields (NeRF) has achieved outstanding performance in modeling 3D objects and controlled scenes, usually under a single scale. In this work, we focus on multi-scale cases where large changes in imagery are observed at drast…

Online Multi-modal Person Search in Videos Open

Jiangyue Xia, Anyi Rao, Qingqiu Huang, Linning Xu, Jiangtao Wen , et al. · 2020

The task of searching certain people in videos has seen increasing potential in real-world applications, such as video organization and editing. Most existing approaches are devised to work in an offline manner, where identities can only b…

A Unified Framework for Shot Type Classification Based on Subject Centric Lens Open

Anyi Rao, Jiaze Wang, Linning Xu, Xuekun Jiang, Qingqiu Huang , et al. · 2020

Shots are key narrative elements of various videos, e.g. movies, TV series, and user-generated videos that are thriving over the Internet. The types of shots greatly influence how the underlying ideas, emotions, and messages are expressed.…

MovieNet: A Holistic Dataset for Movie Understanding Open

Qingqiu Huang, Yu Xiong, Anyi Rao, Jiaze Wang, Dahua Lin · 2020

Recent years have seen remarkable advances in visual understanding. However, how to understand a story-based long video with artistic styles, e.g. movie, remains challenging. In this paper, we introduce MovieNet -- a holistic dataset for m…

A Local-to-Global Approach to Multi-modal Movie Scene Segmentation Open

Anyi Rao, Linning Xu, Yu Xiong, Guodong Xu, Qingqiu Huang , et al. · 2020

Scene, as the crucial unit of storytelling in movies, contains complex activities of actors and their interactions in a physical environment. Identifying the composition of scenes serves as a critical step towards semantic understanding of…

Automatic Music Accompanist Open

Anyi Rao, Francis C. M. Lau · 2018

Automatic musical accompaniment is where a human musician is accompanied by a computer musician. The computer musician is able to produce musical accompaniment that relates musically to the human performance. The accompaniment should follo…

HotFlip: White-Box Adversarial Examples for Text Classification Open

Javid Ebrahimi, Anyi Rao, Daniel Lowd, Dejing Dou · 2018

We propose an efficient method to generate white-box adversarial examples to trick a character-level neural classifier. We find that only a few manipulations are needed to greatly decrease the accuracy. Our method relies on an atomic flip …

HotFlip: White-Box Adversarial Examples for Text Classification Open

Javid Ebrahimi, Anyi Rao, Daniel Lowd, Dejing Dou · 2017

We propose an efficient method to generate white-box adversarial examples to trick a character-level neural classifier. We find that only a few manipulations are needed to greatly decrease the accuracy. Our method relies on an atomic flip …

HotFlip: White-Box Adversarial Examples for NLP Open

Javid Ebrahimi, Anyi Rao, Daniel Lowd, Dejing Dou · 2017

We propose an efficient method to generate white-box adversarial examples to trick a character-level neural classifier. We find that only a few manipulations are needed to greatly decrease the accuracy. Our method relies on an atomic flip …

Anyi Rao YOU? Author Swipe