Explanipedia

Reverse-Engineered Reasoning for Open-Ended Generation Open

Haozhe Wang, Haoran Que, Qixin Xu, Minghao Liu, Wangchunshu Zhou , et al. · 2025

While the ``deep reasoning'' paradigm has spurred significant advances in verifiable domains like mathematics, its application to open-ended, creative generation remains a critical challenge. The two dominant methods for instilling reasoni…

UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning Open

Haoming Wang, Haoyang Zou, Hongmei Song, Jiazhan Feng, Jun‐Jie Fang , et al. · 2025

The development of autonomous agents for graphical user interfaces (GUIs) presents major challenges in artificial intelligence. While recent advances in native agent models have shown promise by unifying perception, reasoning, action, and …

LLM Hallucinations in Practical Code Generation: Phenomena, Mechanism, and Mitigation Open

Z. H. Zhang, Chong Wang, Yanlin Wang, Ensheng Shi, Yuchi Ma , et al. · 2025

Code generation aims to automatically generate code from input requirements, significantly enhancing development efficiency. Recent large language models (LLMs) based approaches have shown promising results and revolutionized code generati…

AdaCtrl: Towards Adaptive and Controllable Reasoning via Difficulty-Aware Budgeting Open

Shijue Huang, Hongru Wang, Wanjun Zhong, Zhaochen Su, Jiazhan Feng , et al. · 2025

Modern large reasoning models demonstrate impressive problem-solving capabilities by employing sophisticated reasoning strategies. However, they often struggle to balance efficiency and effectiveness, frequently generating unnecessarily le…

Self-Reasoning Language Models: Unfold Hidden Reasoning Chains with Few Reasoning Catalyst Open

Hongru Wang, Deng Cai, Wanjun Zhong, Shijue Huang, Jeff Z. Pan , et al. · 2025

Inference-time scaling has attracted much attention which significantly enhance the performance of Large Language Models (LLMs) in complex reasoning tasks by increasing the length of Chain-of-Thought. These longer intermediate reasoning ra…

Acting Less is Reasoning More! Teaching Model to Act Efficiently Open

Hongru Wang, Qian Cheng, Wanjun Zhong, Xiusi Chen, Jun Qiu , et al. · 2025

Tool-integrated reasoning (TIR) augments large language models (LLMs) with the ability to invoke external tools during long-form reasoning, such as search engines and code interpreters, to solve tasks beyond the capabilities of internal re…

ReTool: Reinforcement Learning for Strategic Tool Use in LLMs Open

Jiazhan Feng, Shijue Huang, Ge ZHANG, Yujia Qin, Chengquan Jiang , et al. · 2025

While reasoning models (e.g., DeepSeek R1) trained with reinforcement learning (RL), excel in textual reasoning, they struggle in scenarios requiring structured problem-solving, such as geometric reasoning, concise computation, or complex …

Exploring Iterative Enhancement for Improving Learnersourced Multiple-Choice Question Explanations with Large Language Models Open

Qiming Bao, Juho Leinonen, Alex Yuxuan Peng, Wanjun Zhong, G. Gendron , et al. · 2025

Large language models (LLMs) have demonstrated strong capabilities in language understanding and generation, and their potential in educational contexts is increasingly being explored. One promising area is learnersourcing, where students …

Empowering Self-Learning of LLMs: Inner Knowledge Explicitation as a Catalyst Open

Shijue Huang, Wanjun Zhong, Denise J. Cai, Fanqi Wan, Chengyi Wang , et al. · 2025

Self-learning of Large Language Models (LLMs) facilitates their advancement towards super-intelligence by training with self-synthesized experiences. However, a critical challenge is the amplification of hallucinations in generated data du…

UI-TARS: Pioneering Automated GUI Interaction with Native Agents Open

Yujia Qin, Yining Ye, Jun‐Jie Fang, Haoming Wang, Shihao Liang , et al. · 2025

This paper introduces UI-TARS, a native GUI agent model that solely perceives the screenshots as input and performs human-like interactions (e.g., keyboard and mouse operations). Unlike prevailing agent frameworks that depend on heavily wr…

Self-Reasoning Language Models: Unfold Hidden Reasoning Chains with Few Reasoning Catalyst Open

Hongru Wang, Deng Cai, Wanjun Zhong, Shijue Huang, Jeff Z. Pan , et al. · 2025

AutoKaggle: A Multi-Agent Framework for Autonomous Data Science Competitions Open

Ziming Li, Qianbo Zang, David W.L., Jiawei Guo, Tuney Zheng , et al. · 2024

Data science tasks involving tabular data present complex challenges that require sophisticated problem-solving approaches. We propose AutoKaggle, a powerful and user-centric framework that assists data scientists in completing daily data …

Agents in Software Engineering: Survey, Landscape, and Vision Open

Yanlin Wang, Wanjun Zhong, Yanxian Huang, Ensheng Shi, Min Yang , et al. · 2024

In recent years, Large Language Models (LLMs) have achieved remarkable success and have been widely used in various downstream tasks, especially in the tasks of the software engineering (SE) field. We find that many studies combining LLMs …

You Augment Me: Exploring ChatGPT-based Data Augmentation for Semantic Code Search Open

Yanlin Wang, Lianghong Guo, Ensheng Shi, Wenqing Chen, Jiachi Chen , et al. · 2024

Code search plays a crucial role in software development, enabling developers to retrieve and reuse code using natural language queries. While the performance of code search models improves with an increase in high-quality data, obtaining …

When to Stop? Towards Efficient Code Generation in LLMs with Excess Token Prevention Open

Lianghong Guo, Yanlin Wang, Ensheng Shi, Wanjun Zhong, Hongyu Zhang , et al. · 2024

Code generation aims to automatically generate code snippets that meet given natural language requirements and plays an important role in software development. Although Code LLMs have shown excellent performance in this domain, their long …

Concise and Precise Context Compression for Tool-Using Language Models Open

Yang Xu, Yunlong Feng, Honglin Mu, Yutai Hou, Yitong Li , et al. · 2024

Through reading the documentation in the context, tool-using language models can dynamically extend their capability using external tools. The cost is that we have to input lengthy documentation every time the model needs to use the tool, …

MemoryBank: Enhancing Large Language Models with Long-Term Memory Open

Wanjun Zhong, Lianghong Guo, Qiqi Gao, He Ye, Yanlin Wang · 2024

Large Language Models (LLMs) have drastically reshaped our interactions with artificial intelligence (AI) systems, showcasing impressive performance across an extensive array of tasks. Despite this, a notable hindrance remains—the deficien…

CLongEval: A Chinese Benchmark for Evaluating Long-Context Large Language Models Open

Zexuan Qiu, Jingjing Li, Shijue Huang, Wanjun Zhong, Irwin King · 2024

Developing Large Language Models (LLMs) with robust long-context capabilities has been the recent research focus, resulting in the emergence of long-context LLMs proficient in Chinese. However, the evaluation of these models remains underd…

PerLTQA: A Personal Long-Term Memory Dataset for Memory Classification, Retrieval, and Synthesis in Question Answering Open

Yiming Du, Hongru Wang, Zhengyi Zhao, Bin Liang, Baojun Wang , et al. · 2024

Long-term memory plays a critical role in personal interaction, considering long-term memory can better leverage world knowledge, historical information, and preferences in dialogues. Our research introduces PerLTQA, an innovative QA datas…

Learning to Edit: Aligning LLMs with Knowledge Editing Open

Yuxin Jiang, Yufei Wang, Chuhan Wu, Wanjun Zhong, Xingshan Zeng , et al. · 2024

Knowledge editing techniques, aiming to efficiently modify a minor proportion of knowledge in large language models (LLMs) without negatively impacting performance across other inputs, have garnered widespread attention. However, existing …

Planning, Creation, Usage: Benchmarking LLMs for Comprehensive Tool Utilization in Real-World Complex Scenarios Open

Shijue Huang, Wanjun Zhong, Jianqiao Lu, Qi Zhu, Jiahui Gao , et al. · 2024

The recent trend of using Large Language Models (LLMs) as tool agents in real-world applications underscores the necessity for comprehensive evaluations of their capabilities, particularly in complex scenarios involving planning, creating,…

YODA: Teacher-Student Progressive Learning for Language Models Open

Jianqiao Lu, Wanjun Zhong, Yufei Wang, Zhijiang Guo, Qi Zhu , et al. · 2024

Although large language models (LLMs) have demonstrated adeptness in a range of tasks, they still lag behind human learning efficiency. This disparity is often linked to the inherent human capacity to learn from basic examples, gradually g…

G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model Open

Jiahui Gao, Renjie Pi, Jipeng Zhang, Jiacheng Ye, Wanjun Zhong , et al. · 2023

Large language models (LLMs) have shown remarkable proficiency in human-level reasoning and generation capabilities, which encourages extensive research on their application in mathematical problem solving. However, current work has been l…

Data Management For Training Large Language Models: A Survey Open

Zige Wang, Wanjun Zhong, Yufei Wang, Qi Zhu, Fei Mi , et al. · 2023

Data plays a fundamental role in training Large Language Models (LLMs). Efficient data management, particularly in formulating a well-suited training dataset, is significant for enhancing model performance and improving training efficiency…

FollowBench: A Multi-level Fine-grained Constraints Following Benchmark for Large Language Models Open

Yuxin Jiang, Yufei Wang, Xingshan Zeng, Wanjun Zhong, Liangyou Li , et al. · 2023

The ability to follow instructions is crucial for Large Language Models (LLMs) to handle various real-world applications. Existing benchmarks primarily focus on evaluating pure response quality, rather than assessing whether the response f…

Assessing and Enhancing the Robustness of Large Language Models with Task Structure Variations for Logical Reasoning Open

Qiming Bao, G. Gendron, Alex Yuxuan Peng, Wanjun Zhong, Neşet Tan , et al. · 2023

Large language models (LLMs), such as LLaMA, Alpaca, Vicuna, GPT-3.5 and GPT-4, have advanced the performance of AI systems on various natural language processing tasks to human-like levels. However, their generalisation and robustness whe…

SELF: Self-Evolution with Language Feedback Open

Jianqiao Lu, Wanjun Zhong, Wenyong Huang, Yufei Wang, Fei Mi , et al. · 2023

Large Language Models (LLMs) have demonstrated remarkable versatility across various domains. To further advance LLMs, we propose 'SELF' (Self-Evolution with Language Feedback), a novel approach that enables LLMs to self-improve through se…

Adaptive-Solver Framework for Dynamic Strategy Selection in Large Language Model Reasoning Open

Jianpeng Zhou, Wanjun Zhong, Yanlin Wang, Jiahai Wang · 2023

Large Language Models (LLMs) demonstrate impressive ability in handling reasoning tasks. However, unlike humans who can instinctively adapt their problem-solving strategies to the complexity of task, most LLM-based methods adopt a one-size…

Exploring Iterative Enhancement for Improving Learnersourced Multiple-Choice Question Explanations with Large Language Models Open

Qiming Bao, Juho Leinonen, Alex Yuxuan Peng, Wanjun Zhong, Tim Pistotti , et al. · 2023

Large language models exhibit superior capabilities in processing and understanding language, yet their applications in educational contexts remain underexplored. Learnersourcing enhances learning by engaging students in creating their own…

Aligning Large Language Models with Human: A Survey Open

Yufei Wang, Wanjun Zhong, Liangyou Li, Fei Mi, Xingshan Zeng , et al. · 2023

Large Language Models (LLMs) trained on extensive textual corpora have emerged as leading solutions for a broad array of Natural Language Processing (NLP) tasks. Despite their notable performance, these models are prone to certain limitati…

Wanjun Zhong YOU? Author Swipe