Wanjun Zhong
YOU?
Author Swipe
View article: Reverse-Engineered Reasoning for Open-Ended Generation
Reverse-Engineered Reasoning for Open-Ended Generation Open
While the ``deep reasoning'' paradigm has spurred significant advances in verifiable domains like mathematics, its application to open-ended, creative generation remains a critical challenge. The two dominant methods for instilling reasoni…
View article: UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning
UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning Open
The development of autonomous agents for graphical user interfaces (GUIs) presents major challenges in artificial intelligence. While recent advances in native agent models have shown promise by unifying perception, reasoning, action, and …
View article: LLM Hallucinations in Practical Code Generation: Phenomena, Mechanism, and Mitigation
LLM Hallucinations in Practical Code Generation: Phenomena, Mechanism, and Mitigation Open
Code generation aims to automatically generate code from input requirements, significantly enhancing development efficiency. Recent large language models (LLMs) based approaches have shown promising results and revolutionized code generati…
View article: AdaCtrl: Towards Adaptive and Controllable Reasoning via Difficulty-Aware Budgeting
AdaCtrl: Towards Adaptive and Controllable Reasoning via Difficulty-Aware Budgeting Open
Modern large reasoning models demonstrate impressive problem-solving capabilities by employing sophisticated reasoning strategies. However, they often struggle to balance efficiency and effectiveness, frequently generating unnecessarily le…
View article: Self-Reasoning Language Models: Unfold Hidden Reasoning Chains with Few Reasoning Catalyst
Self-Reasoning Language Models: Unfold Hidden Reasoning Chains with Few Reasoning Catalyst Open
Inference-time scaling has attracted much attention which significantly enhance the performance of Large Language Models (LLMs) in complex reasoning tasks by increasing the length of Chain-of-Thought. These longer intermediate reasoning ra…
View article: Acting Less is Reasoning More! Teaching Model to Act Efficiently
Acting Less is Reasoning More! Teaching Model to Act Efficiently Open
Tool-integrated reasoning (TIR) augments large language models (LLMs) with the ability to invoke external tools during long-form reasoning, such as search engines and code interpreters, to solve tasks beyond the capabilities of internal re…
View article: ReTool: Reinforcement Learning for Strategic Tool Use in LLMs
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs Open
While reasoning models (e.g., DeepSeek R1) trained with reinforcement learning (RL), excel in textual reasoning, they struggle in scenarios requiring structured problem-solving, such as geometric reasoning, concise computation, or complex …
View article: Exploring Iterative Enhancement for Improving Learnersourced Multiple-Choice Question Explanations with Large Language Models
Exploring Iterative Enhancement for Improving Learnersourced Multiple-Choice Question Explanations with Large Language Models Open
Large language models (LLMs) have demonstrated strong capabilities in language understanding and generation, and their potential in educational contexts is increasingly being explored. One promising area is learnersourcing, where students …
View article: Empowering Self-Learning of LLMs: Inner Knowledge Explicitation as a Catalyst
Empowering Self-Learning of LLMs: Inner Knowledge Explicitation as a Catalyst Open
Self-learning of Large Language Models (LLMs) facilitates their advancement towards super-intelligence by training with self-synthesized experiences. However, a critical challenge is the amplification of hallucinations in generated data du…
View article: UI-TARS: Pioneering Automated GUI Interaction with Native Agents
UI-TARS: Pioneering Automated GUI Interaction with Native Agents Open
This paper introduces UI-TARS, a native GUI agent model that solely perceives the screenshots as input and performs human-like interactions (e.g., keyboard and mouse operations). Unlike prevailing agent frameworks that depend on heavily wr…
View article: Self-Reasoning Language Models: Unfold Hidden Reasoning Chains with Few Reasoning Catalyst
Self-Reasoning Language Models: Unfold Hidden Reasoning Chains with Few Reasoning Catalyst Open
View article: AutoKaggle: A Multi-Agent Framework for Autonomous Data Science Competitions
AutoKaggle: A Multi-Agent Framework for Autonomous Data Science Competitions Open
Data science tasks involving tabular data present complex challenges that require sophisticated problem-solving approaches. We propose AutoKaggle, a powerful and user-centric framework that assists data scientists in completing daily data …
View article: Agents in Software Engineering: Survey, Landscape, and Vision
Agents in Software Engineering: Survey, Landscape, and Vision Open
In recent years, Large Language Models (LLMs) have achieved remarkable success and have been widely used in various downstream tasks, especially in the tasks of the software engineering (SE) field. We find that many studies combining LLMs …
View article: You Augment Me: Exploring ChatGPT-based Data Augmentation for Semantic Code Search
You Augment Me: Exploring ChatGPT-based Data Augmentation for Semantic Code Search Open
Code search plays a crucial role in software development, enabling developers to retrieve and reuse code using natural language queries. While the performance of code search models improves with an increase in high-quality data, obtaining …
View article: When to Stop? Towards Efficient Code Generation in LLMs with Excess Token Prevention
When to Stop? Towards Efficient Code Generation in LLMs with Excess Token Prevention Open
Code generation aims to automatically generate code snippets that meet given natural language requirements and plays an important role in software development. Although Code LLMs have shown excellent performance in this domain, their long …
View article: Concise and Precise Context Compression for Tool-Using Language Models
Concise and Precise Context Compression for Tool-Using Language Models Open
Through reading the documentation in the context, tool-using language models can dynamically extend their capability using external tools. The cost is that we have to input lengthy documentation every time the model needs to use the tool, …
View article: MemoryBank: Enhancing Large Language Models with Long-Term Memory
MemoryBank: Enhancing Large Language Models with Long-Term Memory Open
Large Language Models (LLMs) have drastically reshaped our interactions with artificial intelligence (AI) systems, showcasing impressive performance across an extensive array of tasks. Despite this, a notable hindrance remains—the deficien…
View article: CLongEval: A Chinese Benchmark for Evaluating Long-Context Large Language Models
CLongEval: A Chinese Benchmark for Evaluating Long-Context Large Language Models Open
Developing Large Language Models (LLMs) with robust long-context capabilities has been the recent research focus, resulting in the emergence of long-context LLMs proficient in Chinese. However, the evaluation of these models remains underd…
View article: PerLTQA: A Personal Long-Term Memory Dataset for Memory Classification, Retrieval, and Synthesis in Question Answering
PerLTQA: A Personal Long-Term Memory Dataset for Memory Classification, Retrieval, and Synthesis in Question Answering Open
Long-term memory plays a critical role in personal interaction, considering long-term memory can better leverage world knowledge, historical information, and preferences in dialogues. Our research introduces PerLTQA, an innovative QA datas…
View article: Learning to Edit: Aligning LLMs with Knowledge Editing
Learning to Edit: Aligning LLMs with Knowledge Editing Open
Knowledge editing techniques, aiming to efficiently modify a minor proportion of knowledge in large language models (LLMs) without negatively impacting performance across other inputs, have garnered widespread attention. However, existing …
View article: Planning, Creation, Usage: Benchmarking LLMs for Comprehensive Tool Utilization in Real-World Complex Scenarios
Planning, Creation, Usage: Benchmarking LLMs for Comprehensive Tool Utilization in Real-World Complex Scenarios Open
The recent trend of using Large Language Models (LLMs) as tool agents in real-world applications underscores the necessity for comprehensive evaluations of their capabilities, particularly in complex scenarios involving planning, creating,…
View article: YODA: Teacher-Student Progressive Learning for Language Models
YODA: Teacher-Student Progressive Learning for Language Models Open
Although large language models (LLMs) have demonstrated adeptness in a range of tasks, they still lag behind human learning efficiency. This disparity is often linked to the inherent human capacity to learn from basic examples, gradually g…
View article: G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model
G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model Open
Large language models (LLMs) have shown remarkable proficiency in human-level reasoning and generation capabilities, which encourages extensive research on their application in mathematical problem solving. However, current work has been l…
View article: Data Management For Training Large Language Models: A Survey
Data Management For Training Large Language Models: A Survey Open
Data plays a fundamental role in training Large Language Models (LLMs). Efficient data management, particularly in formulating a well-suited training dataset, is significant for enhancing model performance and improving training efficiency…
View article: FollowBench: A Multi-level Fine-grained Constraints Following Benchmark for Large Language Models
FollowBench: A Multi-level Fine-grained Constraints Following Benchmark for Large Language Models Open
The ability to follow instructions is crucial for Large Language Models (LLMs) to handle various real-world applications. Existing benchmarks primarily focus on evaluating pure response quality, rather than assessing whether the response f…
View article: Assessing and Enhancing the Robustness of Large Language Models with Task Structure Variations for Logical Reasoning
Assessing and Enhancing the Robustness of Large Language Models with Task Structure Variations for Logical Reasoning Open
Large language models (LLMs), such as LLaMA, Alpaca, Vicuna, GPT-3.5 and GPT-4, have advanced the performance of AI systems on various natural language processing tasks to human-like levels. However, their generalisation and robustness whe…
View article: SELF: Self-Evolution with Language Feedback
SELF: Self-Evolution with Language Feedback Open
Large Language Models (LLMs) have demonstrated remarkable versatility across various domains. To further advance LLMs, we propose 'SELF' (Self-Evolution with Language Feedback), a novel approach that enables LLMs to self-improve through se…
View article: Adaptive-Solver Framework for Dynamic Strategy Selection in Large Language Model Reasoning
Adaptive-Solver Framework for Dynamic Strategy Selection in Large Language Model Reasoning Open
Large Language Models (LLMs) demonstrate impressive ability in handling reasoning tasks. However, unlike humans who can instinctively adapt their problem-solving strategies to the complexity of task, most LLM-based methods adopt a one-size…
View article: Exploring Iterative Enhancement for Improving Learnersourced Multiple-Choice Question Explanations with Large Language Models
Exploring Iterative Enhancement for Improving Learnersourced Multiple-Choice Question Explanations with Large Language Models Open
Large language models exhibit superior capabilities in processing and understanding language, yet their applications in educational contexts remain underexplored. Learnersourcing enhances learning by engaging students in creating their own…
View article: Aligning Large Language Models with Human: A Survey
Aligning Large Language Models with Human: A Survey Open
Large Language Models (LLMs) trained on extensive textual corpora have emerged as leading solutions for a broad array of Natural Language Processing (NLP) tasks. Despite their notable performance, these models are prone to certain limitati…