Yanghua Xiao
YOU?
Author Swipe
View article: Enhancing Language Agent Strategic Reasoning through Self-Play in Adversarial Games
Enhancing Language Agent Strategic Reasoning through Self-Play in Adversarial Games Open
Existing language agents often encounter difficulties in dynamic adversarial games due to poor strategic reasoning. To mitigate this limitation, a promising approach is to allow agents to learn from game interactions automatically, without…
View article: LogReasoner: Empowering LLMs with Expert-like Coarse-to-Fine Reasoning for Automated Log Analysis
LogReasoner: Empowering LLMs with Expert-like Coarse-to-Fine Reasoning for Automated Log Analysis Open
Log analysis is crucial for monitoring system health and diagnosing failures in complex systems. Recent advances in large language models (LLMs) offer new opportunities for automated log analysis, leveraging their reasoning capabilities to…
View article: CultureScope: A Dimensional Lens for Probing Cultural Understanding in LLMs
CultureScope: A Dimensional Lens for Probing Cultural Understanding in LLMs Open
As large language models (LLMs) are increasingly deployed in diverse cultural environments, evaluating their cultural understanding capability has become essential for ensuring trustworthy and culturally aligned applications. However, most…
View article: INSEva: A Comprehensive Chinese Benchmark for Large Language Models in Insurance
INSEva: A Comprehensive Chinese Benchmark for Large Language Models in Insurance Open
Insurance, as a critical component of the global financial system, demands high standards of accuracy and reliability in AI applications. While existing benchmarks evaluate AI capabilities across various domains, they often fail to capture…
View article: A Stitch in Time Saves Nine: Proactive Self-Refinement for Language Models
A Stitch in Time Saves Nine: Proactive Self-Refinement for Language Models Open
Recent advances in self-refinement have demonstrated significant potential for improving the outputs of large language models (LLMs) through iterative refinement. However, most existing self-refinement methods rely on a reactive process wi…
View article: Mind the Generation Process: Fine-Grained Confidence Estimation During LLM Generation
Mind the Generation Process: Fine-Grained Confidence Estimation During LLM Generation Open
While large language models (LLMs) have demonstrated remarkable performance across diverse tasks, they fundamentally lack self-awareness and frequently exhibit overconfidence, assigning high confidence scores to incorrect predictions. Accu…
View article: DeepSeek: Paradigm Shifts and Technical Evolution in Large AI Models
DeepSeek: Paradigm Shifts and Technical Evolution in Large AI Models Open
DeepSeek, a Chinese Artificial Intelligence (AI) startup, has released their V3 and R1 series models, which attracted global attention due to their low cost, high performance, and open-source advantages. This paper begins by reviewing the …
View article: AgentGroupChat-V2: Divide-and-Conquer Is What LLM-Based Multi-Agent System Need
AgentGroupChat-V2: Divide-and-Conquer Is What LLM-Based Multi-Agent System Need Open
Large language model based multi-agent systems have demonstrated significant potential in social simulation and complex task resolution domains. However, current frameworks face critical challenges in system architecture design, cross-doma…
View article: ARIA: Training Language Agents with Intention-Driven Reward Aggregation
ARIA: Training Language Agents with Intention-Driven Reward Aggregation Open
Large language models (LLMs) have enabled agents to perform complex reasoning and decision-making through free-form language interactions. However, in open-ended language action environments (e.g., negotiation or question-asking games), th…
View article: Can LLMs Learn to Map the World from Local Descriptions?
Can LLMs Learn to Map the World from Local Descriptions? Open
Recent advances in Large Language Models (LLMs) have demonstrated strong capabilities in tasks such as code and mathematics. However, their potential to internalize structured spatial knowledge remains underexplored. This study investigate…
View article: Deep learning-based quantity of talent demand prediction
Deep learning-based quantity of talent demand prediction Open
View article: Attributive Reasoning for Hallucination Diagnosis of Large Language Models
Attributive Reasoning for Hallucination Diagnosis of Large Language Models Open
In recent years, large language models (LLMs) have demonstrated outstanding capabilities in various tasks. However, LLMs also have various drawbacks, especially hallucination. Hallucination refers to the generation of content that does not…
View article: MCiteBench: A Multimodal Benchmark for Generating Text with Citations
MCiteBench: A Multimodal Benchmark for Generating Text with Citations Open
Multimodal Large Language Models (MLLMs) have advanced in integrating diverse modalities but frequently suffer from hallucination. A promising solution to mitigate this issue is to generate text with citations, providing a transparent chai…
View article: Reward Shaping to Mitigate Reward Hacking in RLHF
Reward Shaping to Mitigate Reward Hacking in RLHF Open
Reinforcement Learning from Human Feedback (RLHF) is essential for aligning large language models (LLMs) with human values. However, RLHF is susceptible to \emph{reward hacking}, where the agent exploits flaws in the reward function rather…
View article: DEEPER Insight into Your User: Directed Persona Refinement for Dynamic Persona Modeling
DEEPER Insight into Your User: Directed Persona Refinement for Dynamic Persona Modeling Open
To advance personalized applications such as recommendation systems and user behavior prediction, recent research increasingly adopts large language models (LLMs) for human -readable persona modeling. In dynamic real -world scenarios, effe…
View article: CoSER: Coordinating LLM-Based Persona Simulation of Established Roles
CoSER: Coordinating LLM-Based Persona Simulation of Established Roles Open
Role-playing language agents (RPLAs) have emerged as promising applications of large language models (LLMs). However, simulating established characters presents a challenging task for RPLAs, due to the lack of authentic character datasets …
View article: AdaptiveLog: An Adaptive Log Analysis Framework with the Collaboration of Large and Small Language Model
AdaptiveLog: An Adaptive Log Analysis Framework with the Collaboration of Large and Small Language Model Open
Automated log analysis is crucial to ensure high availability and reliability of complex systems. The advent of LLMs in NLP has ushered in a new era of language model-driven automated log analysis, garnering significant interest. Within th…
View article: CDS: Knowledge Component-Driven Data Synthesis Guided by Cognitive Diagnosis Theory
CDS: Knowledge Component-Driven Data Synthesis Guided by Cognitive Diagnosis Theory Open
Large Language Models (LLMs) have achieved significant advancements, but the increasing complexity of tasks and higher performance demands highlight the need for continuous improvement. Some approaches utilize synthetic data generated by a…
View article: Teaching Large Language Models to Express Knowledge Boundary from Their Own Signals
Teaching Large Language Models to Express Knowledge Boundary from Their Own Signals Open
View article: MCiteBench: A Multimodal Benchmark for Generating Text with Citations
MCiteBench: A Multimodal Benchmark for Generating Text with Citations Open
View article: Ground Every Sentence: Improving Retrieval-Augmented LLMs with Interleaved Reference-Claim Generation
Ground Every Sentence: Improving Retrieval-Augmented LLMs with Interleaved Reference-Claim Generation Open
View article: BOOKWORLD: From Novels to Interactive Agent Societies for Story Creation
BOOKWORLD: From Novels to Interactive Agent Societies for Story Creation Open
View article: Step-by-Step Mastery: Enhancing Soft Constraint Following Ability of Large Language Models
Step-by-Step Mastery: Enhancing Soft Constraint Following Ability of Large Language Models Open
View article: CDS: Data Synthesis Method Guided by Cognitive Diagnosis Theory
CDS: Data Synthesis Method Guided by Cognitive Diagnosis Theory Open
View article: Dialect-SQL: An Adaptive Framework for Bridging the Dialect Gap in Text-to-SQL
Dialect-SQL: An Adaptive Framework for Bridging the Dialect Gap in Text-to-SQL Open
View article: Past Meets Present: Creating Historical Analogy with Large Language Models
Past Meets Present: Creating Historical Analogy with Large Language Models Open
View article: Skeletons Matter: Dynamic Data Augmentation for Text-to-Query
Skeletons Matter: Dynamic Data Augmentation for Text-to-Query Open
The task of translating natural language questions into query languages has long been a central focus in semantic parsing. Recent advancements in Large Language Models (LLMs) have significantly accelerated progress in this field. However, …
View article: From Remembering to Metacognition: Do Existing Benchmarks Accurately Evaluate LLMs?
From Remembering to Metacognition: Do Existing Benchmarks Accurately Evaluate LLMs? Open
View article: Revealing the Barriers of Language Agents in Planning
Revealing the Barriers of Language Agents in Planning Open
View article: StrucText-Eval: Evaluating Large Language Model’s Reasoning Ability in Structure-Rich Text
StrucText-Eval: Evaluating Large Language Model’s Reasoning Ability in Structure-Rich Text Open