Xiaojun Wan
YOU?
Author Swipe
View article: JointCQ: Improving Factual Hallucination Detection with Joint Claim and Query Generation
JointCQ: Improving Factual Hallucination Detection with Joint Claim and Query Generation Open
Current large language models (LLMs) often suffer from hallucination issues, i,e, generating content that appears factual but is actually unreliable. A typical hallucination detection pipeline involves response decomposition (i.e., claim e…
View article: HAD: HAllucination Detection Language Models Based on a Comprehensive Hallucination Taxonomy
HAD: HAllucination Detection Language Models Based on a Comprehensive Hallucination Taxonomy Open
The increasing reliance on natural language generation (NLG) models, particularly large language models, has raised concerns about the reliability and accuracy of their outputs. A key challenge is hallucination, where models produce plausi…
View article: LoaQ: Layer-wise Output Approximation Quantization
LoaQ: Layer-wise Output Approximation Quantization Open
A natural and intuitive idea in model quantization is to approximate each component's quantized output to match its original. Layer-wise post-training quantization (PTQ), though based on this idea, adopts a strictly local view and can achi…
View article: Exploring Causal Effect of Social Bias on Faithfulness Hallucinations in Large Language Models
Exploring Causal Effect of Social Bias on Faithfulness Hallucinations in Large Language Models Open
Large language models (LLMs) have achieved remarkable success in various tasks, yet they remain vulnerable to faithfulness hallucinations, where the output does not align with the input. In this study, we investigate whether social bias co…
View article: Circadian Rhythm Genes-based Prognostic Signature for Bladder Cancer: Association of EZH2 Expression with Anesthetic-related Changes in Circulating Tumor Cells
Circadian Rhythm Genes-based Prognostic Signature for Bladder Cancer: Association of EZH2 Expression with Anesthetic-related Changes in Circulating Tumor Cells Open
Introduction: Circadian rhythm genes (CRGs) play a significant role in the pathogenesis of various cancers, yet their impact on bladder cancer (BC) remains to be fully elucidated. EZH2, as a potential oncological biomarker, lacks clear del…
View article: ICR Probe: Tracking Hidden State Dynamics for Reliable Hallucination Detection in LLMs
ICR Probe: Tracking Hidden State Dynamics for Reliable Hallucination Detection in LLMs Open
Large language models (LLMs) excel at various natural language processing tasks, but their tendency to generate hallucinations undermines their reliability. Existing hallucination detection methods leveraging hidden states predominantly fo…
View article: Enhancing LLM Watermark Resilience Against Both Scrubbing and Spoofing Attacks
Enhancing LLM Watermark Resilience Against Both Scrubbing and Spoofing Attacks Open
Watermarking is a promising defense against the misuse of large language models (LLMs), yet it remains vulnerable to scrubbing and spoofing attacks. This vulnerability stems from an inherent trade-off governed by watermark window size: sma…
View article: Minos: A Multimodal Evaluation Model for Bidirectional Generation Between Image and Text
Minos: A Multimodal Evaluation Model for Bidirectional Generation Between Image and Text Open
Evaluation is important for multimodal generation tasks. With the rapid progress of MLLMs, there is growing interest in applying MLLMs to build general evaluation systems. However, existing work overlooks two aspects: (1) the development o…
View article: NeUQI: Near-Optimal Uniform Quantization Parameter Initialization
NeUQI: Near-Optimal Uniform Quantization Parameter Initialization Open
Large language models (LLMs) achieve impressive performance across domains but face significant challenges when deployed on consumer-grade GPUs or personal devices such as laptops, due to high memory consumption and inference costs. Post-t…
View article: AGENT-X: Adaptive Guideline-based Expert Network for Threshold-free AI-generated teXt detection
AGENT-X: Adaptive Guideline-based Expert Network for Threshold-free AI-generated teXt detection Open
Existing AI-generated text detection methods heavily depend on large annotated datasets and external threshold tuning, restricting interpretability, adaptability, and zero-shot effectiveness. To address these limitations, we propose AGENT-…
View article: ICIs-Associated Adverse Events in Patients with Advanced or Metastatic Renal cell Carcinoma: A Systematic Review and Meta-Analysis
ICIs-Associated Adverse Events in Patients with Advanced or Metastatic Renal cell Carcinoma: A Systematic Review and Meta-Analysis Open
View article: Analyzing Cognitive Differences Among Large Language Models through the Lens of Social Worldview
Analyzing Cognitive Differences Among Large Language Models through the Lens of Social Worldview Open
Large Language Models (LLMs) have become integral to daily life, widely adopted in communication, decision-making, and information retrieval, raising critical questions about how these systems implicitly form and express socio-cognitive at…
View article: C-FAITH: A Chinese Fine-Grained Benchmark for Automated Hallucination Evaluation
C-FAITH: A Chinese Fine-Grained Benchmark for Automated Hallucination Evaluation Open
Despite the rapid advancement of large language models, they remain highly susceptible to generating hallucinations, which significantly hinders their widespread application. Hallucination research requires dynamic and fine-grained evaluat…
View article: DSGram: Dynamic Weighting Sub-Metrics for Grammatical Error Correction in the Era of Large Language Models
DSGram: Dynamic Weighting Sub-Metrics for Grammatical Error Correction in the Era of Large Language Models Open
Evaluating the performance of Grammatical Error Correction (GEC) models has become increasingly challenging, as large language model (LLM)-based GEC systems often produce corrections that diverge from provided gold references. This discrep…
View article: Exploring the Multilingual NLG Evaluation Abilities of LLM-Based Evaluators
Exploring the Multilingual NLG Evaluation Abilities of LLM-Based Evaluators Open
Previous research has shown that LLMs have potential in multilingual NLG evaluation tasks. However, existing research has not fully explored the differences in the evaluation capabilities of LLMs across different languages. To this end, th…
View article: A Dual-Perspective NLG Meta-Evaluation Framework with Automatic Benchmark and Better Interpretability
A Dual-Perspective NLG Meta-Evaluation Framework with Automatic Benchmark and Better Interpretability Open
In NLG meta-evaluation, evaluation metrics are typically assessed based on their consistency with humans. However, we identify some limitations in traditional NLG meta-evaluation approaches, such as issues in handling human ratings and amb…
View article: Evaluating Self-Generated Documents for Enhancing Retrieval-Augmented Generation with Large Language Models
Evaluating Self-Generated Documents for Enhancing Retrieval-Augmented Generation with Large Language Models Open
View article: B4: A Black-Box Scrubbing Attack on LLM Watermarks
B4: A Black-Box Scrubbing Attack on LLM Watermarks Open
View article: DAMON: A Dialogue-Aware MCTS Framework for Jailbreaking Large Language Models
DAMON: A Dialogue-Aware MCTS Framework for Jailbreaking Large Language Models Open
View article: MC-MKE: A Fine-Grained Multimodal Knowledge Editing Benchmark Emphasizing Modality Consistency
MC-MKE: A Fine-Grained Multimodal Knowledge Editing Benchmark Emphasizing Modality Consistency Open
View article: WaterPool: A Language Model Watermark Mitigating Trade-Offs among Imperceptibility, Efficacy and Robustness
WaterPool: A Language Model Watermark Mitigating Trade-Offs among Imperceptibility, Efficacy and Robustness Open
View article: Exploring and Evaluating Multimodal Knowledge Reasoning Consistency of Multimodal Large Language Models
Exploring and Evaluating Multimodal Knowledge Reasoning Consistency of Multimodal Large Language Models Open
View article: Analyzing and Evaluating Correlation Measures in NLG Meta-Evaluation
Analyzing and Evaluating Correlation Measures in NLG Meta-Evaluation Open
View article: Towards A “Novel” Benchmark: Evaluating Literary Fiction with Large Language Models
Towards A “Novel” Benchmark: Evaluating Literary Fiction with Large Language Models Open
View article: Tracing Training Footprints: A Calibration Approach for Membership Inference Attacks Against Multimodal Large Language Models
Tracing Training Footprints: A Calibration Approach for Membership Inference Attacks Against Multimodal Large Language Models Open
View article: Who Writes What: Unveiling the Impact of Author Roles on AI-generated Text Detection
Who Writes What: Unveiling the Impact of Author Roles on AI-generated Text Detection Open
View article: Re-evaluating Automatic LLM System Ranking for Alignment with Human Preference
Re-evaluating Automatic LLM System Ranking for Alignment with Human Preference Open
View article: Gödel Agent: A Self-Referential Agent Framework for Recursively Self-Improvement
Gödel Agent: A Self-Referential Agent Framework for Recursively Self-Improvement Open
View article: R-Bind: Unified Enhancement of Attribute and Relation Binding in Text-to-Image Diffusion Models
R-Bind: Unified Enhancement of Attribute and Relation Binding in Text-to-Image Diffusion Models Open
View article: TriEmbed: Bridge the Gap between Text and Token Indices with Embedding Reparameterization
TriEmbed: Bridge the Gap between Text and Token Indices with Embedding Reparameterization Open