Explanipedia

Densing law of LLMs Open

Chaojun Xiao, Jie Cai, Weiqiang Zhao, Biyuan Lin, Guoyang Zeng , et al. · 2025

Large language models (LLMs) have emerged as a milestone in artificial intelligence. The scaling law indicates that the performance of LLMs can continually improve as the model size increases, which poses challenges for training and deploy…

A Goal Without a Plan Is Just a Wish: Efficient and Effective Global Planner Training for Long-Horizon Agent Tasks Open

Shuzheng Si, Haozhe Zhao, Kangyang Luo, Gang Chen, Fanchao Qi , et al. · 2025

Agents based on large language models (LLMs) struggle with brainless trial-and-error and generating hallucinatory actions due to a lack of global planning in long-horizon tasks. In this paper, we introduce a plan-and-execute framework and …

On LLM-Based Scientific Inductive Reasoning Beyond Equations Open

Brian S. Lin, Jiaxin Yuan, Zihan Zhou, Shouli Wang, Shuo Wang , et al. · 2025

As large language models (LLMs) increasingly exhibit human-like capabilities, a fundamental question emerges: How can we enable LLMs to learn the underlying patterns from limited examples in entirely novel environments and apply them effec…

Chunks as Arms: Multi-Armed Bandit-Guided Sampling for Long-Context LLM Preference Optimization Open

Sunpeng Duan, Xinze Li, Zhenghao Liu, Xiaoyuan Yi, Yukun Yan , et al. · 2025

Long-context modeling is critical for a wide range of real-world tasks, including long-context question answering, summarization, and complex reasoning tasks. Recent studies have explored fine-tuning Large Language Models (LLMs) with synth…

WGRAMMAR: Leverage Prior Knowledge to Accelerate Structured Decoding Open

Ran Wang, Xiaoxuan Liu, Hao Ren, Gang Chen, Qi Feng , et al. · 2025

Structured decoding enables large language models (LLMs) to generate outputs in formats required by downstream systems, such as HTML or JSON. However, existing methods suffer from efficiency bottlenecks due to grammar compilation, state tr…

AutoTriton: Automatic Triton Programming with Reinforcement Learning in LLMs Open

S. Li, Zefan Wang, Ye He, Yuxuan Li, Qi Shi , et al. · 2025

Kernel development in deep learning requires optimizing computational units across hardware while balancing memory management, parallelism, and hardware-specific optimizations through extensive empirical tuning. Although domain-specific la…

Efficient GPT-4V level multimodal large language model for deployment on edge devices Open

Yuan Yao, Tianyu Yu, Ao Zhang, Chongyi Wang, Junbo Cui , et al. · 2025

Computer science Engineering

Multimodal large language models have revolutionized AI research and industry, paving the way toward the next milestone. However, their large sizes and high computational costs restrict deployment to cloud servers, limiting use in mobile, …

KG-Infused RAG: Augmenting Corpus-Based RAG with External Knowledge Graphs Open

Dingjun Wu, Yukun Yan, Zhenghao Liu, Zhiyuan Liu, Maosong Sun · 2025

Retrieval-Augmented Generation (RAG) improves factual accuracy by grounding responses in external knowledge. However, existing RAG methods either rely solely on text corpora and neglect structural knowledge, or build ad-hoc knowledge graph…

AgentCPM-GUI: Building Mobile-Use Agents with Reinforcement Fine-Tuning Open

Zhong Zhang, Yue Lu, Yongjian Fu, Yupeng Huo, Shuo Yang , et al. · 2025

The recent progress of large language model agents has opened new possibilities for automating tasks through graphical user interfaces (GUIs), especially in mobile environments where intelligent interaction can greatly enhance usability. H…

A*-Thought: Efficient Reasoning via Bidirectional Compression for Low-Resource Settings Open

Xiao Xu, Shuo Wang, Xu Han, Zimeng Liu, Haigang Wu , et al. · 2025

Large Reasoning Models (LRMs) achieve superior performance by extending the thought length. However, a lengthy thinking trajectory leads to reduced efficiency. Most of the existing methods are stuck in the assumption of overthinking and at…

Enhancing Long-Chain Reasoning Distillation through Error-Aware Self-Reflection Open

Ziting Wu, X. X. Li, Zhenghao Liu, Yukun Yan, Zhiyuan Liu , et al. · 2025

Large Language Models (LLMs) have exhibited strong reasoning capabilities and achieved remarkable performance in mathematical problem-solving tasks. Recently, distilling reasoning ability from long-form Chains-of-Thought (CoTs) has emerged…

Co-Saving: Resource Aware Multi-Agent Collaboration for Software Development Open

Rennai Qiu, Cheng Qian, Ran Li, Yufan Dang, Weize Chen , et al. · 2025

Recent advancements in Large Language Models (LLMs) and autonomous agents have demonstrated remarkable capabilities across various domains. However, standalone agents frequently encounter limitations when handling complex tasks that demand…

Learning to Route Queries Across Knowledge Bases for Step-wise Retrieval-Augmented Reasoning Open

Chunyi Peng, Zhipeng Xu, Zhenghao Liu, Yishan Li, Yukun Yan , et al. · 2025

Multimodal Retrieval-Augmented Generation (MRAG) has shown promise in mitigating hallucinations in Multimodal Large Language Models (MLLMs) by incorporating external knowledge during generation. Existing MRAG methods typically adopt a stat…

Monocle: Hybrid Local-Global In-Context Evaluation for Long-Text Generation with Uncertainty-Based Active Learning Open

Xiaorong Wang, Ting Yang, Zhu Zhang, Shuo Wang, Zihan Zhou , et al. · 2025

Assessing the quality of long-form, model-generated text is challenging, even with advanced LLM-as-a-Judge methods, due to performance degradation as input length increases. To address this issue, we propose a divide-and-conquer approach, …

The Overthinker's DIET: Cutting Token Calories with DIfficulty-AwarE Training Open

William Chen, Jiarui Yuan, Tailin Jin, Ning Ding, Huimin Chen , et al. · 2025

Recent large language models (LLMs) exhibit impressive reasoning but often over-think, generating excessively long responses that hinder efficiency. We introduce DIET ( DIfficulty-AwarE Training), a framework that systematically cuts these…

Teaching Large Language Models to Maintain Contextual Faithfulness via Synthetic Tasks and Reinforcement Learning Open

Shuzheng Si, Haozhe Zhao, Cheng Gao, Yuzhuo Bai, Zhitong Wang , et al. · 2025

Teaching large language models (LLMs) to be faithful in the provided context is crucial for building reliable information-seeking systems. Therefore, we propose a systematic framework, CANOE, to reduce faithfulness hallucinations of LLMs a…

From Unaligned to Aligned: Scaling Multilingual LLMs with Multi-Way Parallel Corpora Open

Yingli Shen, Wen‐Cheng Lai, Shuo Wang, Ge Gao, Kangyang Luo , et al. · 2025

Continued pretraining and instruction tuning on large-scale multilingual data have proven to be effective in scaling large language models (LLMs) to low-resource languages. However, the unaligned nature of such data limits its ability to e…

ToLeaP: Rethinking Development of Tool Learning with Large Language Models Open

Haotian Chen, Zijun Song, Ben Niu, Ke Zhang, Litu Ou , et al. · 2025

Tool learning, which enables large language models (LLMs) to utilize external tools effectively, has garnered increasing attention for its potential to revolutionize productivity across industries. Despite rapid development in tool learnin…

LLM$\times$MapReduce-V2: Entropy-Driven Convolutional Test-Time Scaling for Generating Long-Form Articles from Extremely Long Resources Open

Haoyu Wang, Yujia Fu, Zhu Zhang, Zhili Li, Bo An , et al. · 2025

Long-form generation is crucial for a wide range of practical applications, typically categorized into short-to-long and long-to-long generation. While short-to-long generations have received considerable attention, generating long texts f…

AIR: A Systematic Analysis of Annotations, Instructions, and Response Pairs in Preference Dataset Open

Bingxiang He, Wenbin Zhang, Cheng Qian, Zixuan Fu, Longtao Huang , et al. · 2025

Preference learning is critical for aligning large language models (LLMs) with human values, yet its success hinges on high-quality datasets comprising three core components: Preference \textbf{A}nnotations, \textbf{I}nstructions, and \tex…

UltraRAG: A Modular and Automated Toolkit for Adaptive Retrieval-Augmented Generation Open

Yuxuan Chen, D. Guo, Sen Mei, Xinze Li, Hao Chen , et al. · 2025

Retrieval-Augmented Generation (RAG) significantly enhances the performance of large language models (LLMs) in downstream tasks by integrating external knowledge. To facilitate researchers in deploying RAG systems, various RAG toolkits hav…

Will Pre-Training Ever End? A First Step Toward Next-Generation Foundation MLLMs via Self-Improving Systematic Cognition Open

Xiaoying Zhang, Da Peng, Ziqi Guo, Chi Chen, Maosong Sun · 2025

Recent progress in (multimodal) large language models ((M)LLMs) has shifted focus from pre-training to inference-time computation and post-training optimization, largely due to concerns over the availability of high-quality human data. How…

Judge as A Judge: Improving the Evaluation of Retrieval-Augmented Generation through the Judge-Consistency of Large Language Models Open

Shuliang Liu, Xinze Li, Zhenghao Liu, Yukun Yan, Cheng Yang , et al. · 2025

Retrieval-Augmented Generation (RAG) has proven its effectiveness in alleviating hallucinations for Large Language Models (LLMs). However, existing automated evaluation metrics cannot fairly evaluate the outputs generated by RAG models dur…

Learning to Generate Structured Output with Schema Reinforcement Learning Open

H. Li, Yesai Wu, Zhiyuan Liu, Fangming Liu, Maosong Sun · 2025

This study investigates the structured generation capabilities of large language models (LLMs), focusing on producing valid JSON outputs against a given schema. Despite the widespread use of JSON in integrating language models with program…

AgentRM: Enhancing Agent Generalization with Reward Modeling Open

Yu Xia, J. D. Fan, Weize Chen, Siyu Yan, Xin Cong , et al. · 2025

Existing LLM-based agents have achieved strong performance on held-in tasks, but their generalizability to unseen tasks remains poor. Hence, some recent work focus on fine-tuning the policy model with more diverse tasks to improve the gene…

NotaGen: Advancing Musicality in Symbolic Music Generation with Large Language Model Training Paradigms Open

Yashan Wang, Shangda Wu, Junru Hu, Xingjian Du, Yongde Peng , et al. · 2025

We introduce NotaGen, a symbolic music generation model aiming to explore the potential of producing high-quality classical sheet music. Inspired by the success of Large Language Models (LLMs), NotaGen adopts pre-training, fine-tuning, and…

HIPPO: Enhancing the Table Understanding Capability of Large Language Models through Hybrid-Modal Preference Optimization Open

Zhenghao Liu, Haolan Wang, Xinze Li, Qingyu Xiong, Xiaocui Yang , et al. · 2025

Tabular data contains rich structural semantics and plays a crucial role in organizing and manipulating information. To better capture these structural semantics, this paper introduces the HybrId-modal Preference oPtimizatiOn (HIPPO) model…

ParamMute: Suppressing Knowledge-Critical FFNs for Faithful Retrieval-Augmented Generation Open

Pengcheng Huang, Zhenghao Liu, Yukun Yan, Haiyan Zhao, Xinxin Yi , et al. · 2025

Large language models (LLMs) integrated with retrieval-augmented generation (RAG) have improved factuality by grounding outputs in external evidence. However, they remain susceptible to unfaithful generation, where outputs contradict retri…

TritonBench: Benchmarking Large Language Model Capabilities for Generating Triton Operators Open

Jianling Li, Shaohui Li, Zhihao Gao, Qi Shi, Yuxuan Li , et al. · 2025

Computer science Business

Triton, a high-level Python-like language designed for building efficient GPU kernels, is widely adopted in deep learning frameworks due to its portability, flexibility, and accessibility. However, programming and parallel optimization sti…

FR-Spec: Accelerating Large-Vocabulary Language Models via Frequency-Ranked Speculative Sampling Open

Weilin Zhao, Tao Pan, Xu Han, Yudi Zhang, Ao Sun , et al. · 2025

Computer science Philosophy

Speculative sampling has emerged as an important technique for accelerating the auto-regressive generation process of large language models (LLMs) by utilizing a draft-then-verify mechanism to produce multiple tokens per forward pass. Whil…

Maosong Sun YOU? Author Swipe