Explanipedia

VAInpaint: Zero-Shot Video-Audio inpainting framework with LLMs-driven Module Open

Kongming Wu, Zeyue Tian, L. Ji, Qifeng Chen · 2025

Video and audio inpainting for mixed audio-visual content has become a crucial task in multimedia editing recently. However, precisely removing an object and its corresponding audio from a video without affecting the rest of the scene rema…

ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data Open

Zhaoyang Liu, Jingjing Xie, Zichen Ding, Z.‐L. Li, Bin Yang , et al. · 2025

Vision-Language Models (VLMs) have enabled computer use agents (CUAs) that operate GUIs autonomously, showing great potential, yet progress is limited by the lack of large-scale, open-source computer use data and foundation models. In this…

YuE: Scaling Open Foundation Models for Long-Form Music Generation Open

Ruibin Yuan, Hanfeng Lin, Shuyue Guo, Ge Zhang, Jiahao Pan , et al. · 2025

We tackle the task of long-form music generation--particularly the challenging \textbf{lyrics-to-song} problem--by introducing YuE, a family of open foundation models based on the LLaMA2 architecture. Specifically, YuE scales to trillions …

Audio-FLAN: A Preliminary Release Open

Liumeng Xue, Ziya Zhou, Jiahao Pan, Zixuan Li, Shuai Fan , et al. · 2025

Recent advancements in audio tokenization have significantly enhanced the integration of audio capabilities into large language models (LLMs). However, audio understanding and generation are often treated as distinct tasks, hindering the d…

Foundation Models for Music: A Survey Open

Yinghao Ma, Anders Øland, Anton Ragni, Bleiz M Del Sette, Charalampos Saitis , et al. · 2024

In recent years, foundation models (FMs) such as large language models (LLMs) and latent diffusion models (LDMs) have profoundly impacted diverse sectors, including music. This comprehensive review examines state-of-the-art (SOTA) pre-trai…

ComposerX: Multi-Agent Symbolic Music Composition with LLMs Open

Qixin Deng, Qikai Yang, Ruibin Yuan, Yipeng Huang, Yi Wang , et al. · 2024

Music composition represents the creative side of humanity, and itself is a complex task that requires abilities to understand and generate information with long dependency and harmony constraints. While demonstrating impressive capabiliti…

Multitarget Device-Free Localization via Cross-Domain Wi-Fi RSS Training Data and Attentional Prior Fusion Open

Na Fan, Zeyue Tian, Amartansh Dubey, Samruddhi Deshmukh, Ross Murch , et al. · 2024

Device-free localization (DFL) using easily-obtained Wi-Fi received signal strength (RSS) has wide real-world applications for not requiring people to carry trackable devices. However, accurate multitarget DFL remains challenging due to th…

ChatMusician: Understanding and Generating Music Intrinsically with LLM Open

Ruibin Yuan, Hanfeng Lin, Y. F. Wang, Zeyue Tian, Shangda Wu , et al. · 2024

While Large Language Models (LLMs) demonstrate impressive capabilities in text generation, we find that their ability has yet to be generalized to music, humanity's creative language. We introduce ChatMusician, an open-source LLM that inte…

MARBLE: Music Audio Representation Benchmark for Universal Evaluation Open

Ruibin Yuan, Yinghao Ma, Yizhi Li, Ge Zhang, Xingran Chen , et al. · 2023

In the era of extensive intersection between art and Artificial Intelligence (AI), such as image generation and fiction co-creation, AI for music remains relatively nascent, particularly in music understanding. This is evident in the limit…

Mixed Neural Voxels for Fast Multi-view Video Synthesis Open

Feng Wang, Sinan Tan, Xinghang Li, Zeyue Tian, Huaping Liu · 2022

Synthesizing high-fidelity videos from real-world multi-view input is challenging because of the complexities of real-world environments and highly dynamic motions. Previous works based on neural radiance fields have demonstrated high-qual…

Zeyue Tian YOU? Author Swipe