Xinya Du
YOU?
Author Swipe
View article: CLUE: Non-parametric Verification from Experience via Hidden-State Clustering
CLUE: Non-parametric Verification from Experience via Hidden-State Clustering Open
Assessing the quality of Large Language Model (LLM) outputs presents a critical challenge. Previous methods either rely on text-level information (e.g., reward models, majority voting), which can overfit to superficial cues, or on calibrat…
View article: Dr.V: A Hierarchical Perception-Temporal-Cognition Framework to Diagnose Video Hallucination by Fine-grained Spatial-Temporal Grounding
Dr.V: A Hierarchical Perception-Temporal-Cognition Framework to Diagnose Video Hallucination by Fine-grained Spatial-Temporal Grounding Open
Recent advancements in large video models (LVMs) have significantly enhance video understanding. However, these models continue to suffer from hallucinations, producing content that conflicts with input videos. To address this issue, we pr…
View article: A Comprehensive Analysis for Visual Object Hallucination in Large Vision-Language Models
A Comprehensive Analysis for Visual Object Hallucination in Large Vision-Language Models Open
Large Vision-Language Models (LVLMs) demonstrate remarkable capabilities in multimodal tasks, but visual object hallucination remains a persistent issue. It refers to scenarios where models generate inaccurate visual object-related informa…
View article: Multimodal Reference Visual Grounding
Multimodal Reference Visual Grounding Open
Visual grounding focuses on detecting objects from images based on language expressions. Recent Large Vision-Language Models (LVLMs) have significantly advanced visual grounding performance by training large models with large-scale dataset…
View article: LDC: Learning to Generate Research Idea with Dynamic Control
LDC: Learning to Generate Research Idea with Dynamic Control Open
Recent advancements in large language models (LLMs) have demonstrated their potential in automating the scientific research ideation. Existing approaches primarily focus on prompting techniques, often producing ideas misaligned with expert…
View article: FG-PRM: Fine-grained Hallucination Detection and Mitigation in Language Model Mathematical Reasoning
FG-PRM: Fine-grained Hallucination Detection and Mitigation in Language Model Mathematical Reasoning Open
Hallucinations in large language models (LLMs) pose significant challenges in tasks requiring complex multi-step reasoning, such as mathematical problem-solving. Existing approaches primarily detect the presence of hallucinations but lack …
View article: Document-level Causal Relation Extraction with Knowledge-guided Binary Question Answering
Document-level Causal Relation Extraction with Knowledge-guided Binary Question Answering Open
As an essential task in information extraction (IE), Event-Event Causal Relation Extraction (ECRE) aims to identify and classify the causal relationships between event mentions in natural language texts. However, existing research on ECRE …
View article: FIHA: Autonomous Hallucination Evaluation in Vision-Language Models with Davidson Scene Graphs
FIHA: Autonomous Hallucination Evaluation in Vision-Language Models with Davidson Scene Graphs Open
The rapid development of Large Vision-Language Models (LVLMs) often comes with widespread hallucination issues, making cost-effective and comprehensive assessments increasingly vital. Current approaches mainly rely on costly annotations an…
View article: MLR-Copilot: Autonomous Machine Learning Research based on Large Language Models Agents
MLR-Copilot: Autonomous Machine Learning Research based on Large Language Models Agents Open
Autonomous machine learning research has gained significant attention recently. We present MLR-COPILOT, an autonomous Machine Learning Research framework powered by large language model agents. The system is designed to enhance ML research…
View article: IQA-EVAL: Automatic Evaluation of Human-Model Interactive Question Answering
IQA-EVAL: Automatic Evaluation of Human-Model Interactive Question Answering Open
To evaluate Large Language Models (LLMs) for question answering (QA), traditional methods typically focus on assessing single-turn responses to given questions. However, this approach doesn't capture the dynamic nature of human-AI interact…
View article: FGAIF: Aligning Large Vision-Language Models with Fine-grained AI Feedback
FGAIF: Aligning Large Vision-Language Models with Fine-grained AI Feedback Open
Large Vision-Language Models (LVLMs) have demonstrated proficiency in tackling a variety of visual-language tasks. However, current LVLMs suffer from misalignment between text and image modalities which causes three kinds of hallucination …
View article: Making Natural Language Reasoning Explainable and Faithful
Making Natural Language Reasoning Explainable and Faithful Open
Neural models, including large language models (LLMs), achieve superior performance on logical reasoning tasks such as question answering. To elicit reasoning capabilities from LLMs, recent works propose using the chain-of-thought (CoT) me…
View article: Leveraging Structured Information for Explainable Multi-hop Question Answering and Reasoning
Leveraging Structured Information for Explainable Multi-hop Question Answering and Reasoning Open
Neural models, including large language models (LLMs), achieve superior performance on multi-hop question-answering. To elicit reasoning capabilities from LLMs, recent works propose using the chain-of-thought (CoT) mechanism to generate bo…
View article: POE: Process of Elimination for Multiple Choice Reasoning
POE: Process of Elimination for Multiple Choice Reasoning Open
Language models (LMs) are capable of conducting in-context learning for multiple choice reasoning tasks, but the options in these tasks are treated equally. As humans often first eliminate wrong options before picking the final correct ans…
View article: Probing Representations for Document-level Event Extraction
Probing Representations for Document-level Event Extraction Open
The probing classifiers framework has been employed for interpreting deep neural network models for a variety of natural language processing (NLP) applications. Studies, however, have largely focused on sentencelevel NLP tasks. This work i…
View article: AGent: A Novel Pipeline for Automatically Creating Unanswerable Questions
AGent: A Novel Pipeline for Automatically Creating Unanswerable Questions Open
The development of large high-quality datasets and high-performing models have led to significant advancements in the domain of Extractive Question Answering (EQA). This progress has sparked considerable interest in exploring unanswerable …
View article: PRD: Peer Rank and Discussion Improve Large Language Model based Evaluations
PRD: Peer Rank and Discussion Improve Large Language Model based Evaluations Open
Nowadays, the quality of responses generated by different modern large language models (LLMs) is hard to evaluate and compare automatically. Recent studies suggest and predominantly use LLMs for reference-free evaluation of open-ended ques…
View article: Leveraging Structured Information for Explainable Multi-hop Question Answering and Reasoning
Leveraging Structured Information for Explainable Multi-hop Question Answering and Reasoning Open
Neural models, including large language models (LLMs), achieve superior performance on multi-hop question-answering. To elicit reasoning capabilities from LLMs, recent works propose using the chain-of-thought (CoT) mechanism to generate bo…
View article: POE: Process of Elimination for Multiple Choice Reasoning
POE: Process of Elimination for Multiple Choice Reasoning Open
Language models (LMs) are capable of conducting in-context learning for multiple choice reasoning tasks, but the options in these tasks are treated equally. As humans often first eliminate wrong options before picking the final correct ans…
View article: End-to-end Case-Based Reasoning for Commonsense Knowledge Base Completion
End-to-end Case-Based Reasoning for Commonsense Knowledge Base Completion Open
Pretrained language models have been shown to store knowledge in their parameters and have achieved reasonable performance in commonsense knowledge base completion (CKBC) tasks. However, CKBC is knowledge-intensive and it is reported that …
View article: Toward Consistent and Informative Event-Event Temporal Relation Extraction
Toward Consistent and Informative Event-Event Temporal Relation Extraction Open
Event-event temporal relation extraction aims to extract the temporal order between a pair of event mentions, which is usually used to construct temporal event graphs. However, event graphs generated by existing methods are usually globall…
View article: Probing Representations for Document-level Event Extraction
Probing Representations for Document-level Event Extraction Open
The probing classifiers framework has been employed for interpreting deep neural network models for a variety of natural language processing (NLP) applications. Studies, however, have largely focused on sentencelevel NLP tasks. This work i…
View article: Automatic Error Analysis for Document-level Information Extraction
Automatic Error Analysis for Document-level Information Extraction Open
Document-level information extraction (IE) tasks have recently begun to be revisited in earnest using the end-to-end neural network techniques that have been successful on their sentence-level IE counterparts. Evaluation of the approaches,…
View article: Few-shot Intent Classification and Slot Filling with Retrieved Examples
Few-shot Intent Classification and Slot Filling with Retrieved Examples Open
Few-shot learning arises in important practical scenarios, such as when a natural language understanding system needs to learn new semantic labels for an emerging, resource-scarce domain. In this paper, we explore retrieval-based methods f…
View article: Few-shot Intent Classification and Slot Filling with Retrieved Examples
Few-shot Intent Classification and Slot Filling with Retrieved Examples Open
Dian Yu, Luheng He, Yuan Zhang, Xinya Du, Panupong Pasupat, Qi Li. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2021.
View article: Template Filling with Generative Transformers
Template Filling with Generative Transformers Open
Template filling is generally tackled by a pipeline of two separate supervised systems – one for role-filler extraction and another for template/event recognition. Since pipelines consider events in isolation, they can suffer from error pr…