Zeyue Tian
YOU?
Author Swipe
View article: VAInpaint: Zero-Shot Video-Audio inpainting framework with LLMs-driven Module
VAInpaint: Zero-Shot Video-Audio inpainting framework with LLMs-driven Module Open
Video and audio inpainting for mixed audio-visual content has become a crucial task in multimedia editing recently. However, precisely removing an object and its corresponding audio from a video without affecting the rest of the scene rema…
View article: ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data
ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data Open
Vision-Language Models (VLMs) have enabled computer use agents (CUAs) that operate GUIs autonomously, showing great potential, yet progress is limited by the lack of large-scale, open-source computer use data and foundation models. In this…
View article: YuE: Scaling Open Foundation Models for Long-Form Music Generation
YuE: Scaling Open Foundation Models for Long-Form Music Generation Open
We tackle the task of long-form music generation--particularly the challenging \textbf{lyrics-to-song} problem--by introducing YuE, a family of open foundation models based on the LLaMA2 architecture. Specifically, YuE scales to trillions …
View article: Audio-FLAN: A Preliminary Release
Audio-FLAN: A Preliminary Release Open
Recent advancements in audio tokenization have significantly enhanced the integration of audio capabilities into large language models (LLMs). However, audio understanding and generation are often treated as distinct tasks, hindering the d…
View article: Foundation Models for Music: A Survey
Foundation Models for Music: A Survey Open
In recent years, foundation models (FMs) such as large language models (LLMs) and latent diffusion models (LDMs) have profoundly impacted diverse sectors, including music. This comprehensive review examines state-of-the-art (SOTA) pre-trai…
View article: ComposerX: Multi-Agent Symbolic Music Composition with LLMs
ComposerX: Multi-Agent Symbolic Music Composition with LLMs Open
Music composition represents the creative side of humanity, and itself is a complex task that requires abilities to understand and generate information with long dependency and harmony constraints. While demonstrating impressive capabiliti…
View article: Multitarget Device-Free Localization via Cross-Domain Wi-Fi RSS Training Data and Attentional Prior Fusion
Multitarget Device-Free Localization via Cross-Domain Wi-Fi RSS Training Data and Attentional Prior Fusion Open
Device-free localization (DFL) using easily-obtained Wi-Fi received signal strength (RSS) has wide real-world applications for not requiring people to carry trackable devices. However, accurate multitarget DFL remains challenging due to th…
View article: ChatMusician: Understanding and Generating Music Intrinsically with LLM
ChatMusician: Understanding and Generating Music Intrinsically with LLM Open
While Large Language Models (LLMs) demonstrate impressive capabilities in text generation, we find that their ability has yet to be generalized to music, humanity's creative language. We introduce ChatMusician, an open-source LLM that inte…
View article: MARBLE: Music Audio Representation Benchmark for Universal Evaluation
MARBLE: Music Audio Representation Benchmark for Universal Evaluation Open
In the era of extensive intersection between art and Artificial Intelligence (AI), such as image generation and fiction co-creation, AI for music remains relatively nascent, particularly in music understanding. This is evident in the limit…
View article: Mixed Neural Voxels for Fast Multi-view Video Synthesis
Mixed Neural Voxels for Fast Multi-view Video Synthesis Open
Synthesizing high-fidelity videos from real-world multi-view input is challenging because of the complexities of real-world environments and highly dynamic motions. Previous works based on neural radiance fields have demonstrated high-qual…