Zhendong Mao
YOU?
Author Swipe
View article: LongAnimation: Long Animation Generation with Dynamic Global-Local Memory
LongAnimation: Long Animation Generation with Dynamic Global-Local Memory Open
Animation colorization is a crucial part of real animation industry production. Long animation colorization has high labor costs. Therefore, automated long animation colorization based on the video generation model has significant research…
View article: DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents
DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents Open
Deep Research Agents are a prominent category of LLM-based agents. By autonomously orchestrating multistep web exploration, targeted retrieval, and higher-order synthesis, they transform vast amounts of online information into analyst-grad…
View article: From Real to Synthetic: Synthesizing Millions of Diversified and Complicated User Instructions with Attributed Grounding
From Real to Synthetic: Synthesizing Millions of Diversified and Complicated User Instructions with Attributed Grounding Open
The pursuit of diverse, complex, and large-scale instruction data is crucial for automatically aligning large language models (LLMs). While there are methods capable of generating synthetic instructions at scale, they either suffer from li…
View article: Pro3D-Editor : A Progressive-Views Perspective for Consistent and Precise 3D Editing
Pro3D-Editor : A Progressive-Views Perspective for Consistent and Precise 3D Editing Open
Text-guided 3D editing aims to precisely edit semantically relevant local 3D regions, which has significant potential for various practical applications ranging from 3D games to film production. Existing methods typically follow a view-ind…
View article: Rationales Are Not Silver Bullets: Measuring the Impact of Rationales on Model Performance and Reliability
Rationales Are Not Silver Bullets: Measuring the Impact of Rationales on Model Performance and Reliability Open
Training language models with rationales augmentation has been shown to be beneficial in many existing works. In this paper, we identify that such a prevailing view does not hold consistently. We conduct comprehensive investigations to tho…
View article: MIRROR: Multi-agent Intra- and Inter-Reflection for Optimized Reasoning in Tool Learning
MIRROR: Multi-agent Intra- and Inter-Reflection for Optimized Reasoning in Tool Learning Open
Complex tasks involving tool integration pose significant challenges for Large Language Models (LLMs), leading to the emergence of multi-agent workflows as a promising solution. Reflection has emerged as an effective strategy for correctin…
View article: Leveraging Importance Sampling to Detach Alignment Modules from Large Language Models
Leveraging Importance Sampling to Detach Alignment Modules from Large Language Models Open
The widespread adoption of large language models (LLMs) across industries has increased the demand for high-quality and customizable outputs. However, traditional alignment methods often require retraining large pretrained models, making i…
View article: Training LLM-Based Agents with Synthetic Self-Reflected Trajectories and Partial Masking
Training LLM-Based Agents with Synthetic Self-Reflected Trajectories and Partial Masking Open
Autonomous agents, which perceive environments and take actions to achieve goals, have become increasingly feasible with the advancements in large language models (LLMs). However, current powerful agents often depend on sophisticated promp…
View article: DACL-RAG: Data Augmentation Strategy with Curriculum Learning for Retrieval-Augmented Generation
DACL-RAG: Data Augmentation Strategy with Curriculum Learning for Retrieval-Augmented Generation Open
Retrieval-Augmented Generation (RAG) is an effective method to enhance the capabilities of large language models (LLMs). Existing methods typically optimize the retriever or the generator in a RAG system by directly using the top-k retriev…
View article: HDGlyph: A Hierarchical Disentangled Glyph-Based Framework for Long-Tail Text Rendering in Diffusion Models
HDGlyph: A Hierarchical Disentangled Glyph-Based Framework for Long-Tail Text Rendering in Diffusion Models Open
Visual text rendering, which aims to accurately integrate specified textual content within generated images, is critical for various applications such as commercial design. Despite recent advances, current methods struggle with long-tail t…
View article: DualReal: Adaptive Joint Training for Lossless Identity-Motion Fusion in Video Customization
DualReal: Adaptive Joint Training for Lossless Identity-Motion Fusion in Video Customization Open
Customized text-to-video generation with pre-trained large-scale models has recently garnered significant attention through focusing on identity and motion consistency. Existing works typically follow the isolated customized paradigm, wher…
View article: Land use types and soil pH co-mediate bacterial community assembly processes: Application of the neutral community model and null model to determine stochastic and deterministic processes in a subtropical basin, China
Land use types and soil pH co-mediate bacterial community assembly processes: Application of the neutral community model and null model to determine stochastic and deterministic processes in a subtropical basin, China Open
Land use regimes strongly impact bacterial microbial communities. However, the ecological processes shaping bacterial community assembly under various land use types and the factors altering the balance between these processes remain poorl…
View article: Automated Creativity Evaluation for Large Language Models: A Reference-Based Approach
Automated Creativity Evaluation for Large Language Models: A Reference-Based Approach Open
Creative writing is a key capability of Large Language Models (LLMs), with potential applications in literature, storytelling, and various creative domains. However, evaluating the creativity of machine-generated texts remains a significan…
View article: HistLLM: A Unified Framework for LLM-Based Multimodal Recommendation with User History Encoding and Compression
HistLLM: A Unified Framework for LLM-Based Multimodal Recommendation with User History Encoding and Compression Open
While large language models (LLMs) have proven effective in leveraging textual data for recommendations, their application to multimodal recommendation tasks remains relatively underexplored. Although LLMs can process multimodal informatio…
View article: D$^2$iT: Dynamic Diffusion Transformer for Accurate Image Generation
D$^2$iT: Dynamic Diffusion Transformer for Accurate Image Generation Open
Diffusion models are widely recognized for their ability to generate high-fidelity images. Despite the excellent performance and scalability of the Diffusion Transformer (DiT) architecture, it applies fixed compression across different ima…
View article: Graph Mixture of Experts and Memory-augmented Routers for Multivariate Time Series Anomaly Detection
Graph Mixture of Experts and Memory-augmented Routers for Multivariate Time Series Anomaly Detection Open
Multivariate time series (MTS) anomaly detection is a critical task that involves identifying abnormal patterns or events in data that consist of multiple interrelated time series. In order to better model the complex interdependence betwe…
View article: ELDER: Enhancing Lifelong Model Editing with Mixture-of-LoRA
ELDER: Enhancing Lifelong Model Editing with Mixture-of-LoRA Open
Large language models (LLMs) require model editing to efficiently update specific knowledge within them and avoid factual errors. Most model editing methods are solely designed for single-time use and result in a significant forgetting eff…
View article: CustomContrast: A Multilevel Contrastive Perspective for Subject-Driven Text-to-Image Customization
CustomContrast: A Multilevel Contrastive Perspective for Subject-Driven Text-to-Image Customization Open
Subject-driven text-to-image (T2I) customization has drawn significant interest in academia and industry. This task enables pre-trained models to generate novel images based on unique subjects. Existing studies adopt a self-reconstructive …
View article: SVLTA: Benchmarking Vision-Language Temporal Alignment via Synthetic Video Situation
SVLTA: Benchmarking Vision-Language Temporal Alignment via Synthetic Video Situation Open
Vision-language temporal alignment is a crucial capability for human dynamic recognition and cognition in real-world scenarios. While existing research focuses on capturing vision-language relevance, it faces limitations due to biased temp…
View article: Leveraging Robust Optimization for LLM Alignment under Distribution Shifts
Leveraging Robust Optimization for LLM Alignment under Distribution Shifts Open
Preference alignment methods are increasingly critical for steering large language models (LLMs) to generate outputs consistent with human values. While recent approaches often rely on synthetic data generated by LLMs for scalability and c…
View article: RealGeneral: Unifying Visual Generation via Temporal In-Context Learning with Video Models
RealGeneral: Unifying Visual Generation via Temporal In-Context Learning with Video Models Open
Unifying diverse image generation tasks within a single framework remains a fundamental challenge in visual generation. While large language models (LLMs) achieve unification through task-agnostic data and generation, existing visual gener…
View article: On-the-fly Preference Alignment via Principle-Guided Decoding
On-the-fly Preference Alignment via Principle-Guided Decoding Open
With the rapidly expanding landscape of large language models, aligning model generations with human values and preferences is becoming increasingly important. Popular alignment methods, such as Reinforcement Learning from Human Feedback, …