Explanipedia

Can We Reliably Rank Model Performance across Domains without Labeled Data? Open

Veronica Rammouz, A. Reina González, Carlos Cruzportillo, Adrian Tan, Nicole Beebe , et al. · 2025

Estimating model performance without labels is an important goal for understanding how NLP models generalize. While prior work has proposed measures based on dataset similarity or predicted correctness, it remains unclear when these estima…

Role-Conditioned Refusals: Evaluating Access Control Reasoning in Large Language Models Open

Đorđe Klisura, Joseph Khoury, Ram Krishnan, Anthony Rios · 2025

Access control is a cornerstone of secure computing, yet large language models often blur role boundaries by producing unrestricted responses. We study role-conditioned refusals, focusing on the LLM's ability to adhere to access control po…

Bike Frames: Understanding the Implicit Portrayal of Cyclists in the News Open

Xingmeng Zhao, Dan Schumacher, Sashank Nalluri, Suhana Shrestha, Xavier Walton , et al. · 2025

Increasing cycling for transportation or recreation can boost public health and reduce the environmental impacts of vehicles. However, news agencies' ideologies and reporting styles often influence public perception of cycling. For example…

Does It Run and Is That Enough? Revisiting Text-to-Chart Generation with a Multi-Agent Approach Open

Anthony Rios · 2025

Large language models can translate natural-language chart descriptions into runnable code, yet approximately 15\% of the generated scripts still fail to execute, even after supervised fine-tuning and reinforcement learning. We investigate…

A Multi-Agent Framework for Mitigating Dialect Biases in Privacy Policy Question-Answering Systems Open

Đorđe Klisura, Astrid R Bernaga Torres, Anna Karen Gárate-Escamilla, Rajesh Roshan Biswal, Ke Yang , et al. · 2025

Privacy policies inform users about data collection and usage, yet their complexity limits accessibility for diverse populations. Existing Privacy Policy Question Answering (QA) systems exhibit performance disparities across English dialec…

Charting the Future: Using Chart Question-Answering for Scalable Evaluation of LLM-Driven Data Visualizations Open

James M. Ford, Xingmeng Zhao, Dan Schumacher, Anthony Rios · 2024

Computer science Mathematics

We propose a novel framework that leverages Visual Question Answering (VQA) models to automate the evaluation of LLM-generated data visualizations. Traditional evaluation methods often rely on human judgment, which is costly and unscalable…

Improving LLM Reasoning with Multi-Agent Tree-of-Thought Validator Agent Open

Fatemeh Haji, Mazal Bethany, Maryam Tabar, Jason Chiang, Anthony Rios , et al. · 2024

Computer science Mathematics

Multi-agent strategies have emerged as a promising approach to enhance the reasoning abilities of Large Language Models (LLMs) by assigning specialized roles in the problem-solving process. Concurrently, Tree of Thoughts (ToT) methods have…

Enhancing Event Reasoning in Large Language Models through Instruction Fine-Tuning with Semantic Causal Graphs Open

Mazal Bethany, Emet Bethany, Brandon Wherry, Cho‐Yu Jason Chiang, Nishant Vishwamitra , et al. · 2024

Computer science Psychology Philosophy

Event detection and text reasoning have become critical applications across various domains. While LLMs have recently demonstrated impressive progress in reasoning abilities, they often struggle with event detection, particularly due to th…

RASTeR: Robust, Agentic, and Structured Temporal Reasoning Open

Dan Schumacher, Fatemeh Haji, T. C. Grey, Niharika Bandlamudi, Nupoor Karnik , et al. · 2024

Computer science Geography Philosophy

Temporal question answering (TQA) remains a challenge for large language models (LLMs), particularly when retrieved content may be irrelevant, outdated, or temporally inconsistent. This is especially critical in applications like clinical …

Beyond Text-to-SQL for IoT Defense: A Comprehensive Framework for Querying and Classifying IoT Threats Open

Ryan Pavlich, Nima Ebadi, Richard Tarbell, Billy Linares, Adrian Eng-Choon Tan , et al. · 2024

Computer science

Recognizing the promise of natural language interfaces to databases, prior studies have emphasized the development of text-to-SQL systems. While substantial progress has been made in this field, existing research has concentrated on genera…

Improving Expert Radiology Report Summarization by Prompting Large Language Models with a Layperson Summary Open

Xingmeng Zhao, Tongnian Wang, Anthony Rios · 2024

Computer science Psychology Medicine

Radiology report summarization (RRS) is crucial for patient care, requiring concise "Impressions" from detailed "Findings." This paper introduces a novel prompting strategy to enhance RRS by first generating a layperson summary. This appro…

Unmasking Database Vulnerabilities: Zero-Knowledge Schema Inference Attacks in Text-to-SQL Systems Open

Đorđe Klisura, Anthony Rios · 2024

Computer science Philosophy

Text-to-SQL systems empower users to interact with databases using natural language, automatically translating queries into executable SQL code. However, their reliance on database schema information for SQL generation exposes them to sign…

Team UTSA-NLP at SemEval 2024 Task 5: Prompt Ensembling for Argument Reasoning in Civil Procedures with GPT4 Open

Dan Schumacher, Anthony Rios · 2024

Computer science Chemistry Economics

In this paper, we present our system for the SemEval Task 5, The Legal Argument Reasoning Task in Civil Procedure Challenge. Legal argument reasoning is an essential skill that all law students must master. Moreover, it is important to dev…

Extracting Biomedical Entities from Noisy Audio Transcripts Open

Nima Ebadi, K. Marielle Morgan, Adrian Eng-Choon Tan, Billy Linares, Sheri Osborn , et al. · 2024

Computer science Biology

Automatic Speech Recognition (ASR) technology is fundamental in transcribing spoken language into text, with considerable applications in the clinical realm, including streamlining medical transcription and integrating with Electronic Heal…

BabyStories: Can Reinforcement Learning Teach Baby Language Models to Write Better Stories? Open

Xingmeng Zhao, Tongnian Wang, Sheri Osborn, Anthony Rios · 2023

Computer science Psychology Physics

Language models have seen significant growth in the size of their corpus, leading to notable performance improvements. Yet, there has been limited progress in developing models that handle smaller, more human-like datasets. As part of the …

A marker-based neural network system for extracting social determinants of health Open

Xingmeng Zhao, Anthony Rios · 2023

Computer science Political science Philosophy

Objective The impact of social determinants of health (SDoH) on patients’ healthcare quality and the disparity is well known. Many SDoH items are not coded in structured forms in electronic health records. These items are often captured in…

Towards Understanding the Generalization of Medical Text-to-SQL Models and Datasets Open

Richard Tarbell, Kim‐Kwang Raymond Choo, Glenn Dietrich, Anthony Rios · 2023

Computer science Mathematics

Electronic medical records (EMRs) are stored in relational databases. It can be challenging to access the required information if the user is unfamiliar with the database schema or general database fundamentals. Hence, researchers have exp…

Bike Frames: Understanding the Implicit Portrayal of Cyclists in the News Open

Xingmeng Zhao, Xavier Walton, Suhana Shrestha, Anthony Rios · 2023

Psychology Business Engineering

Increasing cycling for transportation or recreation can boost health and reduce the environmental impacts of vehicles. However, news agencies' ideologies and reporting styles often influence public perception of cycling. For example, if ne…

BabyStories: Can Reinforcement Learning Teach Baby Language Models to Write Better Stories? Open

Xingmeng Zhao, Tongnian Wang, Sheri Osborn, Anthony Rios · 2023

Computer science Psychology Physics

Language models have seen significant growth in the size of their corpus, leading to notable performance improvements.Yet, there has been limited progress in developing models that handle smaller, more human-like datasets.As part of the Ba…

UTSA-NLP at RadSum23: Multi-modal Retrieval-Based Chest X-Ray Report Summarization Open

Tongnian Wang, Xingmeng Zhao, Anthony Rios · 2023

Computer science Sociology Mathematics

Radiology report summarization aims to automatically provide concise summaries of radiology findings, reducing time and errors in manual summaries. However, current methods solely summarize the text, which overlooks critical details in the…

A Marker-based Neural Network System for Extracting Social Determinants of Health Open

Xingmeng Zhao, Anthony Rios · 2022

Computer science Political science Engineering

Objective. The impact of social determinants of health (SDoH) on patients' healthcare quality and the disparity is well-known. Many SDoH items are not coded in structured forms in electronic health records. These items are often captured i…

Anthony Rios YOU? Author Swipe