Explanipedia

LongAnimation: Long Animation Generation with Dynamic Global-Local Memory Open

Nan Chen, Mengqi Huang, Yihao Meng, Zhendong Mao · 2025

Animation colorization is a crucial part of real animation industry production. Long animation colorization has high labor costs. Therefore, automated long animation colorization based on the video generation model has significant research…

DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents Open

Mingxuan Du, Benfeng Xu, Chiwei Zhu, Xiaorui Wang, Zhendong Mao · 2025

Deep Research Agents are a prominent category of LLM-based agents. By autonomously orchestrating multistep web exploration, targeted retrieval, and higher-order synthesis, they transform vast amounts of online information into analyst-grad…

From Real to Synthetic: Synthesizing Millions of Diversified and Complicated User Instructions with Attributed Grounding Open

Chenxi Zhu, Benfeng Xu, Xiaorui Wang, Zhendong Mao · 2025

The pursuit of diverse, complex, and large-scale instruction data is crucial for automatically aligning large language models (LLMs). While there are methods capable of generating synthetic instructions at scale, they either suffer from li…

Pro3D-Editor : A Progressive-Views Perspective for Consistent and Precise 3D Editing Open

Zheng Yang, Mengqi Huang, Nan Chen, Zhendong Mao · 2025

Text-guided 3D editing aims to precisely edit semantically relevant local 3D regions, which has significant potential for various practical applications ranging from 3D games to film production. Existing methods typically follow a view-ind…

Rationales Are Not Silver Bullets: Measuring the Impact of Rationales on Model Performance and Reliability Open

Chiwei Zhu, Benfeng Xu, Yang An, Jie Lin, Quan Wang , et al. · 2025

Training language models with rationales augmentation has been shown to be beneficial in many existing works. In this paper, we identify that such a prevailing view does not hold consistently. We conduct comprehensive investigations to tho…

MIRROR: Multi-agent Intra- and Inter-Reflection for Optimized Reasoning in Tool Learning Open

Zikang Guo, Benfeng Xu, Xiaorui Wang, Zhendong Mao · 2025

Complex tasks involving tool integration pose significant challenges for Large Language Models (LLMs), leading to the emergence of multi-agent workflows as a promising solution. Reflection has emerged as an effective strategy for correctin…

Leveraging Importance Sampling to Detach Alignment Modules from Large Language Models Open

Yi Liu, Dianqing Liu, M.C. Zhu, Junbo Guo, Yongdong Zhang , et al. · 2025

The widespread adoption of large language models (LLMs) across industries has increased the demand for high-quality and customizable outputs. However, traditional alignment methods often require retraining large pretrained models, making i…

Training LLM-Based Agents with Synthetic Self-Reflected Trajectories and Partial Masking Open

Yihan Chen, Benfeng Xu, Xiaorui Wang, Yongdong Zhang, Zhendong Mao · 2025

Autonomous agents, which perceive environments and take actions to achieve goals, have become increasingly feasible with the advancements in large language models (LLMs). However, current powerful agents often depend on sophisticated promp…

DACL-RAG: Data Augmentation Strategy with Curriculum Learning for Retrieval-Augmented Generation Open

Shaohan Wang, Licheng Zhang, Zheren Fu, Zhendong Mao, Yongdong Zhang · 2025

Retrieval-Augmented Generation (RAG) is an effective method to enhance the capabilities of large language models (LLMs). Existing methods typically optimize the retriever or the generator in a RAG system by directly using the top-k retriev…

HDGlyph: A Hierarchical Disentangled Glyph-Based Framework for Long-Tail Text Rendering in Diffusion Models Open

Shuhan Zhuang, Mengqi Huang, Fengyi Fu, Nan Chen, Bohan Lei , et al. · 2025

Visual text rendering, which aims to accurately integrate specified textual content within generated images, is critical for various applications such as commercial design. Despite recent advances, current methods struggle with long-tail t…

DualReal: Adaptive Joint Training for Lossless Identity-Motion Fusion in Video Customization Open

Wenchuan Wang, Mengqi Huang, Y. Tu, Zhendong Mao · 2025

Customized text-to-video generation with pre-trained large-scale models has recently garnered significant attention through focusing on identity and motion consistency. Existing works typically follow the isolated customized paradigm, wher…

Land use types and soil pH co-mediate bacterial community assembly processes: Application of the neutral community model and null model to determine stochastic and deterministic processes in a subtropical basin, China Open

Jinquan Huang, Yujie Qi, D. Chen, Jigen Liu, Li Li , et al. · 2025

Land use regimes strongly impact bacterial microbial communities. However, the ecological processes shaping bacterial community assembly under various land use types and the factors altering the balance between these processes remain poorl…

Automated Creativity Evaluation for Large Language Models: A Reference-Based Approach Open

Ruizhe Li, Chiwei Zhu, Benfeng Xu, Xiaorui Wang, Zhendong Mao · 2025

Creative writing is a key capability of Large Language Models (LLMs), with potential applications in literature, storytelling, and various creative domains. However, evaluating the creativity of machine-generated texts remains a significan…

HistLLM: A Unified Framework for LLM-Based Multimodal Recommendation with User History Encoding and Compression Open

Chen Zhang, Bo Hua Hu, Weidong Chen, Zhendong Mao · 2025

While large language models (LLMs) have proven effective in leveraging textual data for recommendations, their application to multimodal recommendation tasks remains relatively underexplored. Although LLMs can process multimodal informatio…

D$^2$iT: Dynamic Diffusion Transformer for Accurate Image Generation Open

Weixin Jia, Mengqi Huang, Nan Chen, Lei Zhang, Zhendong Mao · 2025

Diffusion models are widely recognized for their ability to generate high-fidelity images. Despite the excellent performance and scalability of the Diffusion Transformer (DiT) architecture, it applies fixed compression across different ima…

Graph Mixture of Experts and Memory-augmented Routers for Multivariate Time Series Anomaly Detection Open

Xiaoyu Huang, Weidong Chen, Bo Hu, Zhendong Mao · 2025

Computer science Geology Physics

Multivariate time series (MTS) anomaly detection is a critical task that involves identifying abnormal patterns or events in data that consist of multiple interrelated time series. In order to better model the complex interdependence betwe…

ELDER: Enhancing Lifelong Model Editing with Mixture-of-LoRA Open

Jiaang Li, Quan Wang, Zhong-Nan Wang, Yongdong Zhang, Zhendong Mao · 2025

Computer science Psychology

Large language models (LLMs) require model editing to efficiently update specific knowledge within them and avoid factual errors. Most model editing methods are solely designed for single-time use and result in a significant forgetting eff…

CustomContrast: A Multilevel Contrastive Perspective for Subject-Driven Text-to-Image Customization Open

Nan Chen, Mengqi Huang, Zhuowei Chen, Yang Zheng, Lei Zhang , et al. · 2025

Computer science Philosophy

Subject-driven text-to-image (T2I) customization has drawn significant interest in academia and industry. This task enables pre-trained models to generate novel images based on unique subjects. Existing studies adopt a self-reconstructive …

SVLTA: Benchmarking Vision-Language Temporal Alignment via Synthetic Video Situation Open

Hao Du, Bo Wu, Yan Lu, Zhendong Mao · 2025

Vision-language temporal alignment is a crucial capability for human dynamic recognition and cognition in real-world scenarios. While existing research focuses on capturing vision-language relevance, it faces limitations due to biased temp…

Leveraging Robust Optimization for LLM Alignment under Distribution Shifts Open

M.C. Zhu, Yi Liu, Zheren Fu, Yong Zhang, Zhendong Mao · 2025

Preference alignment methods are increasingly critical for steering large language models (LLMs) to generate outputs consistent with human values. While recent approaches often rely on synthetic data generated by LLMs for scalability and c…

RealGeneral: Unifying Visual Generation via Temporal In-Context Learning with Video Models Open

Yijing Lin, Mengqi Huang, Shuhan Zhuang, Zhendong Mao · 2025

Unifying diverse image generation tasks within a single framework remains a fundamental challenge in visual generation. While large language models (LLMs) achieve unification through task-agnostic data and generation, existing visual gener…

On-the-fly Preference Alignment via Principle-Guided Decoding Open

Mingye Zhu, Yi Liu, Lei Zhang, Junbo Guo, Zhendong Mao · 2025

Computer science Mathematics

With the rapidly expanding landscape of large language models, aligning model generations with human values and preferences is becoming increasingly important. Popular alignment methods, such as Reinforcement Learning from Human Feedback, …

Zhendong Mao YOU? Author Swipe