Maosong Sun
YOU?
Author Swipe
View article: Densing law of LLMs
Densing law of LLMs Open
Large language models (LLMs) have emerged as a milestone in artificial intelligence. The scaling law indicates that the performance of LLMs can continually improve as the model size increases, which poses challenges for training and deploy…
View article: A Goal Without a Plan Is Just a Wish: Efficient and Effective Global Planner Training for Long-Horizon Agent Tasks
A Goal Without a Plan Is Just a Wish: Efficient and Effective Global Planner Training for Long-Horizon Agent Tasks Open
Agents based on large language models (LLMs) struggle with brainless trial-and-error and generating hallucinatory actions due to a lack of global planning in long-horizon tasks. In this paper, we introduce a plan-and-execute framework and …
View article: On LLM-Based Scientific Inductive Reasoning Beyond Equations
On LLM-Based Scientific Inductive Reasoning Beyond Equations Open
As large language models (LLMs) increasingly exhibit human-like capabilities, a fundamental question emerges: How can we enable LLMs to learn the underlying patterns from limited examples in entirely novel environments and apply them effec…
View article: Chunks as Arms: Multi-Armed Bandit-Guided Sampling for Long-Context LLM Preference Optimization
Chunks as Arms: Multi-Armed Bandit-Guided Sampling for Long-Context LLM Preference Optimization Open
Long-context modeling is critical for a wide range of real-world tasks, including long-context question answering, summarization, and complex reasoning tasks. Recent studies have explored fine-tuning Large Language Models (LLMs) with synth…
View article: WGRAMMAR: Leverage Prior Knowledge to Accelerate Structured Decoding
WGRAMMAR: Leverage Prior Knowledge to Accelerate Structured Decoding Open
Structured decoding enables large language models (LLMs) to generate outputs in formats required by downstream systems, such as HTML or JSON. However, existing methods suffer from efficiency bottlenecks due to grammar compilation, state tr…
View article: AutoTriton: Automatic Triton Programming with Reinforcement Learning in LLMs
AutoTriton: Automatic Triton Programming with Reinforcement Learning in LLMs Open
Kernel development in deep learning requires optimizing computational units across hardware while balancing memory management, parallelism, and hardware-specific optimizations through extensive empirical tuning. Although domain-specific la…
View article: Efficient GPT-4V level multimodal large language model for deployment on edge devices
Efficient GPT-4V level multimodal large language model for deployment on edge devices Open
Multimodal large language models have revolutionized AI research and industry, paving the way toward the next milestone. However, their large sizes and high computational costs restrict deployment to cloud servers, limiting use in mobile, …
View article: KG-Infused RAG: Augmenting Corpus-Based RAG with External Knowledge Graphs
KG-Infused RAG: Augmenting Corpus-Based RAG with External Knowledge Graphs Open
Retrieval-Augmented Generation (RAG) improves factual accuracy by grounding responses in external knowledge. However, existing RAG methods either rely solely on text corpora and neglect structural knowledge, or build ad-hoc knowledge graph…
View article: AgentCPM-GUI: Building Mobile-Use Agents with Reinforcement Fine-Tuning
AgentCPM-GUI: Building Mobile-Use Agents with Reinforcement Fine-Tuning Open
The recent progress of large language model agents has opened new possibilities for automating tasks through graphical user interfaces (GUIs), especially in mobile environments where intelligent interaction can greatly enhance usability. H…
View article: A*-Thought: Efficient Reasoning via Bidirectional Compression for Low-Resource Settings
A*-Thought: Efficient Reasoning via Bidirectional Compression for Low-Resource Settings Open
Large Reasoning Models (LRMs) achieve superior performance by extending the thought length. However, a lengthy thinking trajectory leads to reduced efficiency. Most of the existing methods are stuck in the assumption of overthinking and at…
View article: Enhancing Long-Chain Reasoning Distillation through Error-Aware Self-Reflection
Enhancing Long-Chain Reasoning Distillation through Error-Aware Self-Reflection Open
Large Language Models (LLMs) have exhibited strong reasoning capabilities and achieved remarkable performance in mathematical problem-solving tasks. Recently, distilling reasoning ability from long-form Chains-of-Thought (CoTs) has emerged…
View article: Co-Saving: Resource Aware Multi-Agent Collaboration for Software Development
Co-Saving: Resource Aware Multi-Agent Collaboration for Software Development Open
Recent advancements in Large Language Models (LLMs) and autonomous agents have demonstrated remarkable capabilities across various domains. However, standalone agents frequently encounter limitations when handling complex tasks that demand…
View article: Learning to Route Queries Across Knowledge Bases for Step-wise Retrieval-Augmented Reasoning
Learning to Route Queries Across Knowledge Bases for Step-wise Retrieval-Augmented Reasoning Open
Multimodal Retrieval-Augmented Generation (MRAG) has shown promise in mitigating hallucinations in Multimodal Large Language Models (MLLMs) by incorporating external knowledge during generation. Existing MRAG methods typically adopt a stat…
View article: Monocle: Hybrid Local-Global In-Context Evaluation for Long-Text Generation with Uncertainty-Based Active Learning
Monocle: Hybrid Local-Global In-Context Evaluation for Long-Text Generation with Uncertainty-Based Active Learning Open
Assessing the quality of long-form, model-generated text is challenging, even with advanced LLM-as-a-Judge methods, due to performance degradation as input length increases. To address this issue, we propose a divide-and-conquer approach, …
View article: The Overthinker's DIET: Cutting Token Calories with DIfficulty-AwarE Training
The Overthinker's DIET: Cutting Token Calories with DIfficulty-AwarE Training Open
Recent large language models (LLMs) exhibit impressive reasoning but often over-think, generating excessively long responses that hinder efficiency. We introduce DIET ( DIfficulty-AwarE Training), a framework that systematically cuts these…
View article: Teaching Large Language Models to Maintain Contextual Faithfulness via Synthetic Tasks and Reinforcement Learning
Teaching Large Language Models to Maintain Contextual Faithfulness via Synthetic Tasks and Reinforcement Learning Open
Teaching large language models (LLMs) to be faithful in the provided context is crucial for building reliable information-seeking systems. Therefore, we propose a systematic framework, CANOE, to reduce faithfulness hallucinations of LLMs a…
View article: From Unaligned to Aligned: Scaling Multilingual LLMs with Multi-Way Parallel Corpora
From Unaligned to Aligned: Scaling Multilingual LLMs with Multi-Way Parallel Corpora Open
Continued pretraining and instruction tuning on large-scale multilingual data have proven to be effective in scaling large language models (LLMs) to low-resource languages. However, the unaligned nature of such data limits its ability to e…
View article: ToLeaP: Rethinking Development of Tool Learning with Large Language Models
ToLeaP: Rethinking Development of Tool Learning with Large Language Models Open
Tool learning, which enables large language models (LLMs) to utilize external tools effectively, has garnered increasing attention for its potential to revolutionize productivity across industries. Despite rapid development in tool learnin…
View article: LLM$\times$MapReduce-V2: Entropy-Driven Convolutional Test-Time Scaling for Generating Long-Form Articles from Extremely Long Resources
LLM$\times$MapReduce-V2: Entropy-Driven Convolutional Test-Time Scaling for Generating Long-Form Articles from Extremely Long Resources Open
Long-form generation is crucial for a wide range of practical applications, typically categorized into short-to-long and long-to-long generation. While short-to-long generations have received considerable attention, generating long texts f…
View article: AIR: A Systematic Analysis of Annotations, Instructions, and Response Pairs in Preference Dataset
AIR: A Systematic Analysis of Annotations, Instructions, and Response Pairs in Preference Dataset Open
Preference learning is critical for aligning large language models (LLMs) with human values, yet its success hinges on high-quality datasets comprising three core components: Preference \textbf{A}nnotations, \textbf{I}nstructions, and \tex…
View article: UltraRAG: A Modular and Automated Toolkit for Adaptive Retrieval-Augmented Generation
UltraRAG: A Modular and Automated Toolkit for Adaptive Retrieval-Augmented Generation Open
Retrieval-Augmented Generation (RAG) significantly enhances the performance of large language models (LLMs) in downstream tasks by integrating external knowledge. To facilitate researchers in deploying RAG systems, various RAG toolkits hav…
View article: Will Pre-Training Ever End? A First Step Toward Next-Generation Foundation MLLMs via Self-Improving Systematic Cognition
Will Pre-Training Ever End? A First Step Toward Next-Generation Foundation MLLMs via Self-Improving Systematic Cognition Open
Recent progress in (multimodal) large language models ((M)LLMs) has shifted focus from pre-training to inference-time computation and post-training optimization, largely due to concerns over the availability of high-quality human data. How…
View article: Judge as A Judge: Improving the Evaluation of Retrieval-Augmented Generation through the Judge-Consistency of Large Language Models
Judge as A Judge: Improving the Evaluation of Retrieval-Augmented Generation through the Judge-Consistency of Large Language Models Open
Retrieval-Augmented Generation (RAG) has proven its effectiveness in alleviating hallucinations for Large Language Models (LLMs). However, existing automated evaluation metrics cannot fairly evaluate the outputs generated by RAG models dur…
View article: Learning to Generate Structured Output with Schema Reinforcement Learning
Learning to Generate Structured Output with Schema Reinforcement Learning Open
This study investigates the structured generation capabilities of large language models (LLMs), focusing on producing valid JSON outputs against a given schema. Despite the widespread use of JSON in integrating language models with program…
View article: AgentRM: Enhancing Agent Generalization with Reward Modeling
AgentRM: Enhancing Agent Generalization with Reward Modeling Open
Existing LLM-based agents have achieved strong performance on held-in tasks, but their generalizability to unseen tasks remains poor. Hence, some recent work focus on fine-tuning the policy model with more diverse tasks to improve the gene…
View article: NotaGen: Advancing Musicality in Symbolic Music Generation with Large Language Model Training Paradigms
NotaGen: Advancing Musicality in Symbolic Music Generation with Large Language Model Training Paradigms Open
We introduce NotaGen, a symbolic music generation model aiming to explore the potential of producing high-quality classical sheet music. Inspired by the success of Large Language Models (LLMs), NotaGen adopts pre-training, fine-tuning, and…
View article: HIPPO: Enhancing the Table Understanding Capability of Large Language Models through Hybrid-Modal Preference Optimization
HIPPO: Enhancing the Table Understanding Capability of Large Language Models through Hybrid-Modal Preference Optimization Open
Tabular data contains rich structural semantics and plays a crucial role in organizing and manipulating information. To better capture these structural semantics, this paper introduces the HybrId-modal Preference oPtimizatiOn (HIPPO) model…
View article: ParamMute: Suppressing Knowledge-Critical FFNs for Faithful Retrieval-Augmented Generation
ParamMute: Suppressing Knowledge-Critical FFNs for Faithful Retrieval-Augmented Generation Open
Large language models (LLMs) integrated with retrieval-augmented generation (RAG) have improved factuality by grounding outputs in external evidence. However, they remain susceptible to unfaithful generation, where outputs contradict retri…
View article: TritonBench: Benchmarking Large Language Model Capabilities for Generating Triton Operators
TritonBench: Benchmarking Large Language Model Capabilities for Generating Triton Operators Open
Triton, a high-level Python-like language designed for building efficient GPU kernels, is widely adopted in deep learning frameworks due to its portability, flexibility, and accessibility. However, programming and parallel optimization sti…
View article: FR-Spec: Accelerating Large-Vocabulary Language Models via Frequency-Ranked Speculative Sampling
FR-Spec: Accelerating Large-Vocabulary Language Models via Frequency-Ranked Speculative Sampling Open
Speculative sampling has emerged as an important technique for accelerating the auto-regressive generation process of large language models (LLMs) by utilizing a draft-then-verify mechanism to produce multiple tokens per forward pass. Whil…