Explanipedia

Enhancing Language Agent Strategic Reasoning through Self-Play in Adversarial Games Open

Yikai Zhang, Ye Rong, Siyu Yuan, Jiangjie Chen, Jian Xie , et al. · 2025

Existing language agents often encounter difficulties in dynamic adversarial games due to poor strategic reasoning. To mitigate this limitation, a promising approach is to allow agents to learn from game interactions automatically, without…

LogReasoner: Empowering LLMs with Expert-like Coarse-to-Fine Reasoning for Automated Log Analysis Open

Lipeng Ma, Yixuan Li, Weidong Yang, Mingjie Zhou, Xinyi Liu , et al. · 2025

Log analysis is crucial for monitoring system health and diagnosing failures in complex systems. Recent advances in large language models (LLMs) offer new opportunities for automated log analysis, leveraging their reasoning capabilities to…

CultureScope: A Dimensional Lens for Probing Cultural Understanding in LLMs Open

J. W. Zhang, Sihang Jiang, Shiwei Guo, Shisong Chen, Yanghua Xiao , et al. · 2025

As large language models (LLMs) are increasingly deployed in diverse cultural environments, evaluating their cultural understanding capability has become essential for ensuring trustworthy and culturally aligned applications. However, most…

INSEva: A Comprehensive Chinese Benchmark for Large Language Models in Insurance Open

Shisong Chen, Qian Zhu, Wenyan Yang, Chengyi Yang, Wang Zhong , et al. · 2025

Insurance, as a critical component of the global financial system, demands high standards of accuracy and reliability in AI applications. While existing benchmarks evaluate AI capabilities across various domains, they often fail to capture…

A Stitch in Time Saves Nine: Proactive Self-Refinement for Language Models Open

Han Jiang, X. Wang, Haiquan Zhao, T. Li, Zeyang Jiang , et al. · 2025

Recent advances in self-refinement have demonstrated significant potential for improving the outputs of large language models (LLMs) through iterative refinement. However, most existing self-refinement methods rely on a reactive process wi…

Mind the Generation Process: Fine-Grained Confidence Estimation During LLM Generation Open

Han Jiang, Tingyun Li, Shisong Chen, Jie Shi, Xinyi Wang , et al. · 2025

While large language models (LLMs) have demonstrated remarkable performance across diverse tasks, they fundamentally lack self-awareness and frequently exhibit overconfidence, assigning high confidence scores to incorrect predictions. Accu…

DeepSeek: Paradigm Shifts and Technical Evolution in Large AI Models Open

Luolin Xiong, Haofen Wang, Xi Chen, Lu Sheng, Yun Xiong , et al. · 2025

DeepSeek, a Chinese Artificial Intelligence (AI) startup, has released their V3 and R1 series models, which attracted global attention due to their low cost, high performance, and open-source advantages. This paper begins by reviewing the …

AgentGroupChat-V2: Divide-and-Conquer Is What LLM-Based Multi-Agent System Need Open

Zhouhong Gu, Xiaoxuan Zhu, Yi Cai, H. F. Shen, Xingzhou Chen , et al. · 2025

Large language model based multi-agent systems have demonstrated significant potential in social simulation and complex task resolution domains. However, current frameworks face critical challenges in system architecture design, cross-doma…

ARIA: Training Language Agents with Intention-Driven Reward Aggregation Open

Ruihan Yang, Yikai Zhang, Aili Chen, Xintao Wang, Siyu Yuan , et al. · 2025

Large language models (LLMs) have enabled agents to perform complex reasoning and decision-making through free-form language interactions. However, in open-ended language action environments (e.g., negotiation or question-asking games), th…

Can LLMs Learn to Map the World from Local Descriptions? Open

Sirui Xia, Aili Chen, Xintao Wang, Tinghui Zhu, Yaozhong Zhang , et al. · 2025

Recent advances in Large Language Models (LLMs) have demonstrated strong capabilities in tasks such as code and mathematics. However, their potential to internalize structured spatial knowledge remains underexplored. This study investigate…

Deep learning-based quantity of talent demand prediction Open

Lei Qiao, Zhihe Wu, Yanghua Xiao, Zhihao Wang, Ze Zhou , et al. · 2025

Attributive Reasoning for Hallucination Diagnosis of Large Language Models Open

Yuyan Chen, Zehao Li, Shibing You, Zhengyu Chen, Jingwen Chang , et al. · 2025

In recent years, large language models (LLMs) have demonstrated outstanding capabilities in various tasks. However, LLMs also have various drawbacks, especially hallucination. Hallucination refers to the generation of content that does not…

MCiteBench: A Multimodal Benchmark for Generating Text with Citations Open

Caiyu Hu, Yikai Zhang, Tinghui Zhu, Yiwei Ye, Yanghua Xiao · 2025

Multimodal Large Language Models (MLLMs) have advanced in integrating diverse modalities but frequently suffer from hallucination. A promising solution to mitigate this issue is to generate text with citations, providing a transparent chai…

Reward Shaping to Mitigate Reward Hacking in RLHF Open

Jiayi Fu, Xuandong Zhao, Chengyuan Yao, Heng Wang, Han Qi , et al. · 2025

Reinforcement Learning from Human Feedback (RLHF) is essential for aligning large language models (LLMs) with human values. However, RLHF is susceptible to \emph{reward hacking}, where the agent exploits flaws in the reward function rather…

DEEPER Insight into Your User: Directed Persona Refinement for Dynamic Persona Modeling Open

Aili Chen, Chengyu Du, Jiangjie Chen, Jinghan Xu, Yikai Zhang , et al. · 2025

To advance personalized applications such as recommendation systems and user behavior prediction, recent research increasingly adopts large language models (LLMs) for human -readable persona modeling. In dynamic real -world scenarios, effe…

CoSER: Coordinating LLM-Based Persona Simulation of Established Roles Open

Xintao Wang, Heng Wang, Yifei Zhang, Xinfeng Yuan, Rui‐Hua Xu , et al. · 2025

Role-playing language agents (RPLAs) have emerged as promising applications of large language models (LLMs). However, simulating established characters presents a challenging task for RPLAs, due to the lack of authentic character datasets …

AdaptiveLog: An Adaptive Log Analysis Framework with the Collaboration of Large and Small Language Model Open

Lipeng Ma, Weidong Yang, Yixuan Li, Ben Fei, Mingjie Zhou , et al. · 2025

Automated log analysis is crucial to ensure high availability and reliability of complex systems. The advent of LLMs in NLP has ushered in a new era of language model-driven automated log analysis, garnering significant interest. Within th…

CDS: Knowledge Component-Driven Data Synthesis Guided by Cognitive Diagnosis Theory Open

Haokun Zhao, Han Jiang, Jiaqing Liang, Yanghua Xiao · 2025

Large Language Models (LLMs) have achieved significant advancements, but the increasing complexity of tasks and higher performance demands highlight the need for continuous improvement. Some approaches utilize synthetic data generated by a…

Teaching Large Language Models to Express Knowledge Boundary from Their Own Signals Open

Lida Chen, Zujie Liang, Xintao Wang, Jiaqing Liang, Yanghua Xiao , et al. · 2025

MCiteBench: A Multimodal Benchmark for Generating Text with Citations Open

Caiyu Hu, Yikai Zhang, Tinghui Zhu, Yiwei Ye, Yanghua Xiao · 2025

Ground Every Sentence: Improving Retrieval-Augmented LLMs with Interleaved Reference-Claim Generation Open

Sirui Xia, Xintao Wang, Jiaqing Liang, Yifei Zhang, Weikang Zhou , et al. · 2025

BOOKWORLD: From Novels to Interactive Agent Societies for Story Creation Open

Yiting Ran, Xintao Wang, Tian Qiu, Jiaqing Liang, Yanghua Xiao , et al. · 2025

Step-by-Step Mastery: Enhancing Soft Constraint Following Ability of Large Language Models Open

Qingyu Ren, Jie Zeng, Qianyu He, Jiaqing Liang, Yanghua Xiao , et al. · 2025

CDS: Data Synthesis Method Guided by Cognitive Diagnosis Theory Open

Haokun Zhao, Han Jiang, Jiaqing Liang, Yanghua Xiao, Xiaojun Meng , et al. · 2025

Dialect-SQL: An Adaptive Framework for Bridging the Dialect Gap in Text-to-SQL Open

Jie Shi, Xuetao Cao, Bo Xu, Jiaqing Liang, Yanghua Xiao , et al. · 2025

Past Meets Present: Creating Historical Analogy with Large Language Models Open

Nianqi Li, Siyu Yuan, Jiangjie Chen, Jiaqing Liang, Wei Feng , et al. · 2025

Skeletons Matter: Dynamic Data Augmentation for Text-to-Query Open

Yan Ji, Bo Xu, Jie Shi, Jiaqing Liang, Deqing Yang , et al. · 2025

The task of translating natural language questions into query languages has long been a central focus in semantic parsing. Recent advancements in Large Language Models (LLMs) have significantly accelerated progress in this field. However, …

From Remembering to Metacognition: Do Existing Benchmarks Accurately Evaluate LLMs? Open

Geng Zhang, Yi Ying, Sihang Jiang, Jiaqing Liang, G. K. Yue , et al. · 2025

Revealing the Barriers of Language Agents in Planning Open

Jian Xie, Kexun Zhang, Jiangjie Chen, Siyu Yuan, Kai Zhang , et al. · 2025

StrucText-Eval: Evaluating Large Language Model’s Reasoning Ability in Structure-Rich Text Open

Zhouhong Gu, Haoning Ye, Xingzhou Chen, Zeyang Zhou, Hongwei Feng , et al. · 2025

Yanghua Xiao YOU? Author Swipe