Explanipedia

CLUE: Non-parametric Verification from Experience via Hidden-State Clustering Open

Zhenwen Liang, Ruosen Li, Yujun Zhou, Linfeng Song, Dian Yu , et al. · 2025

Assessing the quality of Large Language Model (LLM) outputs presents a critical challenge. Previous methods either rely on text-level information (e.g., reward models, majority voting), which can overfit to superficial cues, or on calibrat…

Dr.V: A Hierarchical Perception-Temporal-Cognition Framework to Diagnose Video Hallucination by Fine-grained Spatial-Temporal Grounding Open

Meng Luo, Shengqiong Wu, Liqiang Jing, Tianjie Ju, Zheng Li , et al. · 2025

Recent advancements in large video models (LVMs) have significantly enhance video understanding. However, these models continue to suffer from hallucinations, producing content that conflicts with input videos. To address this issue, we pr…

A Comprehensive Analysis for Visual Object Hallucination in Large Vision-Language Models Open

Liqiang Jing, Guiming Hardy Chen, Ehsan Aghazadeh, Xin Eric Wang, Xinya Du · 2025

Large Vision-Language Models (LVLMs) demonstrate remarkable capabilities in multimodal tasks, but visual object hallucination remains a persistent issue. It refers to scenarios where models generate inaccurate visual object-related informa…

Multimodal Reference Visual Grounding Open

Yangxiao Lu, Ruosen Li, Liqiang Jing, Jikai Wang, Xinya Du , et al. · 2025

Visual grounding focuses on detecting objects from images based on language expressions. Recent Large Vision-Language Models (LVLMs) have significantly advanced visual grounding performance by training large models with large-scale dataset…

LDC: Learning to Generate Research Idea with Dynamic Control Open

Ruochen Li, Liqiang Jing, Chi Han, Jiawei Zhou, Xinya Du · 2024

Computer science

Recent advancements in large language models (LLMs) have demonstrated their potential in automating the scientific research ideation. Existing approaches primarily focus on prompting techniques, often producing ideas misaligned with expert…

FG-PRM: Fine-grained Hallucination Detection and Mitigation in Language Model Mathematical Reasoning Open

Ruosen Li, Ziming Luo, Xinya Du · 2024

Computer science Psychology Philosophy

Hallucinations in large language models (LLMs) pose significant challenges in tasks requiring complex multi-step reasoning, such as mathematical problem-solving. Existing approaches primarily detect the presence of hallucinations but lack …

Document-level Causal Relation Extraction with Knowledge-guided Binary Question Answering Open

Zimu Wang, Lei Xia, Wei Wang, Xinya Du · 2024

Computer science Psychology Mathematics

As an essential task in information extraction (IE), Event-Event Causal Relation Extraction (ECRE) aims to identify and classify the causal relationships between event mentions in natural language texts. However, existing research on ECRE …

FIHA: Autonomous Hallucination Evaluation in Vision-Language Models with Davidson Scene Graphs Open

Bowen Yan, Zhengsong Zhang, Liqiang Jing, Eftekhar Hossain, Xinya Du · 2024

Psychology Computer science

The rapid development of Large Vision-Language Models (LVLMs) often comes with widespread hallucination issues, making cost-effective and comprehensive assessments increasingly vital. Current approaches mainly rely on costly annotations an…

MLR-Copilot: Autonomous Machine Learning Research based on Large Language Models Agents Open

Ruochen Li, Teerth Patel, Qingyun Wang, Xinya Du · 2024

Computer science Psychology

Autonomous machine learning research has gained significant attention recently. We present MLR-COPILOT, an autonomous Machine Learning Research framework powered by large language model agents. The system is designed to enhance ML research…

IQA-EVAL: Automatic Evaluation of Human-Model Interactive Question Answering Open

Ruosen Li, Barry Wang, Ruochen Li, Xinya Du · 2024

Computer science

To evaluate Large Language Models (LLMs) for question answering (QA), traditional methods typically focus on assessing single-turn responses to given questions. However, this approach doesn't capture the dynamic nature of human-AI interact…

FGAIF: Aligning Large Vision-Language Models with Fine-grained AI Feedback Open

Liqiang Jing, Xinya Du · 2024

Computer science

Large Vision-Language Models (LVLMs) have demonstrated proficiency in tackling a variety of visual-language tasks. However, current LVLMs suffer from misalignment between text and image modalities which causes three kinds of hallucination …

Making Natural Language Reasoning Explainable and Faithful Open

Xinya Du · 2024

Computer science Psychology Philosophy

Neural models, including large language models (LLMs), achieve superior performance on logical reasoning tasks such as question answering. To elicit reasoning capabilities from LLMs, recent works propose using the chain-of-thought (CoT) me…

Leveraging Structured Information for Explainable Multi-hop Question Answering and Reasoning Open

Ruosen Li, Xinya Du · 2023

Computer science Psychology Philosophy

Neural models, including large language models (LLMs), achieve superior performance on multi-hop question-answering. To elicit reasoning capabilities from LLMs, recent works propose using the chain-of-thought (CoT) mechanism to generate bo…

POE: Process of Elimination for Multiple Choice Reasoning Open

Chenkai Ma, Xinya Du · 2023

Computer science Economics Biology

Language models (LMs) are capable of conducting in-context learning for multiple choice reasoning tasks, but the options in these tasks are treated equally. As humans often first eliminate wrong options before picking the final correct ans…

Probing Representations for Document-level Event Extraction Open

Barry Wang, Xinya Du, Claire Cardie · 2023

Computer science Mathematics Chemistry

The probing classifiers framework has been employed for interpreting deep neural network models for a variety of natural language processing (NLP) applications. Studies, however, have largely focused on sentencelevel NLP tasks. This work i…

AGent: A Novel Pipeline for Automatically Creating Unanswerable Questions Open

Son Quoc Tran, Gia-Huy Do, Phong Nguyen-Thuan Do, Matt Kretchmar, Xinya Du · 2023

Computer science Mathematics Philosophy

The development of large high-quality datasets and high-performing models have led to significant advancements in the domain of Extractive Question Answering (EQA). This progress has sparked considerable interest in exploring unanswerable …

PRD: Peer Rank and Discussion Improve Large Language Model based Evaluations Open

Ruosen Li, Teerth Patel, Xinya Du · 2023

Computer science Mathematics Geography

Nowadays, the quality of responses generated by different modern large language models (LLMs) is hard to evaluate and compare automatically. Recent studies suggest and predominantly use LLMs for reference-free evaluation of open-ended ques…

Leveraging Structured Information for Explainable Multi-hop Question Answering and Reasoning Open

Ruosen Li, Xinya Du · 2023

Computer science Philosophy Geography

Neural models, including large language models (LLMs), achieve superior performance on multi-hop question-answering. To elicit reasoning capabilities from LLMs, recent works propose using the chain-of-thought (CoT) mechanism to generate bo…

POE: Process of Elimination for Multiple Choice Reasoning Open

Chenkai Ma, Xinya Du · 2023

Computer science Engineering Biology

Language models (LMs) are capable of conducting in-context learning for multiple choice reasoning tasks, but the options in these tasks are treated equally. As humans often first eliminate wrong options before picking the final correct ans…

End-to-end Case-Based Reasoning for Commonsense Knowledge Base Completion Open

Zonglin Yang, Xinya Du, Erik Cambria, Claire Cardie · 2023

Computer science Geography Economics

Pretrained language models have been shown to store knowledge in their parameters and have achieved reasonable performance in commonsense knowledge base completion (CKBC) tasks. However, CKBC is knowledge-intensive and it is reported that …

Toward Consistent and Informative Event-Event Temporal Relation Extraction Open

Xiaomeng Jin, Haoyang Wen, Xinya Du, Heng Ji · 2023

Computer science Physics

Event-event temporal relation extraction aims to extract the temporal order between a pair of event mentions, which is usually used to construct temporal event graphs. However, event graphs generated by existing methods are usually globall…

Probing Representations for Document-level Event Extraction Open

Barry Wang, Xinya Du, Claire Cardie · 2023

Computer science Chemistry Physics

The probing classifiers framework has been employed for interpreting deep neural network models for a variety of natural language processing (NLP) applications. Studies, however, have largely focused on sentencelevel NLP tasks. This work i…

Automatic Error Analysis for Document-level Information Extraction Open

Aliva Das, Xinya Du, Barry Wang, Kejian Shi, Jiayuan Gu , et al. · 2022

Computer science Engineering Chemistry

Document-level information extraction (IE) tasks have recently begun to be revisited in earnest using the end-to-end neural network techniques that have been successful on their sentence-level IE counterparts. Evaluation of the approaches,…

Few-shot Intent Classification and Slot Filling with Retrieved Examples Open

Dian Yu, Luheng He, Yuan Zhang, Xinya Du, Panupong Pasupat , et al. · 2021

Computer science Mathematics Chemistry

Few-shot learning arises in important practical scenarios, such as when a natural language understanding system needs to learn new semantic labels for an emerging, resource-scarce domain. In this paper, we explore retrieval-based methods f…

Few-shot Intent Classification and Slot Filling with Retrieved Examples Open

Dian Yu, Luheng He, Yuan Zhang, Xinya Du, Panupong Pasupat , et al. · 2021

Computer science History Philosophy

Dian Yu, Luheng He, Yuan Zhang, Xinya Du, Panupong Pasupat, Qi Li. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2021.

Template Filling with Generative Transformers Open

Xinya Du, Alexander M. Rush, Claire Cardie · 2021

Computer science Engineering Physics

Template filling is generally tackled by a pipeline of two separate supervised systems – one for role-filler extraction and another for template/event recognition. Since pipelines consider events in isolation, they can suffer from error pr…

Xinya Du YOU? Author Swipe