Anthony Rios
YOU?
Author Swipe
View article: Can We Reliably Rank Model Performance across Domains without Labeled Data?
Can We Reliably Rank Model Performance across Domains without Labeled Data? Open
Estimating model performance without labels is an important goal for understanding how NLP models generalize. While prior work has proposed measures based on dataset similarity or predicted correctness, it remains unclear when these estima…
View article: Role-Conditioned Refusals: Evaluating Access Control Reasoning in Large Language Models
Role-Conditioned Refusals: Evaluating Access Control Reasoning in Large Language Models Open
Access control is a cornerstone of secure computing, yet large language models often blur role boundaries by producing unrestricted responses. We study role-conditioned refusals, focusing on the LLM's ability to adhere to access control po…
View article: Bike Frames: Understanding the Implicit Portrayal of Cyclists in the News
Bike Frames: Understanding the Implicit Portrayal of Cyclists in the News Open
Increasing cycling for transportation or recreation can boost public health and reduce the environmental impacts of vehicles. However, news agencies' ideologies and reporting styles often influence public perception of cycling. For example…
View article: Does It Run and Is That Enough? Revisiting Text-to-Chart Generation with a Multi-Agent Approach
Does It Run and Is That Enough? Revisiting Text-to-Chart Generation with a Multi-Agent Approach Open
Large language models can translate natural-language chart descriptions into runnable code, yet approximately 15\% of the generated scripts still fail to execute, even after supervised fine-tuning and reinforcement learning. We investigate…
View article: A Multi-Agent Framework for Mitigating Dialect Biases in Privacy Policy Question-Answering Systems
A Multi-Agent Framework for Mitigating Dialect Biases in Privacy Policy Question-Answering Systems Open
Privacy policies inform users about data collection and usage, yet their complexity limits accessibility for diverse populations. Existing Privacy Policy Question Answering (QA) systems exhibit performance disparities across English dialec…
View article: Charting the Future: Using Chart Question-Answering for Scalable Evaluation of LLM-Driven Data Visualizations
Charting the Future: Using Chart Question-Answering for Scalable Evaluation of LLM-Driven Data Visualizations Open
We propose a novel framework that leverages Visual Question Answering (VQA) models to automate the evaluation of LLM-generated data visualizations. Traditional evaluation methods often rely on human judgment, which is costly and unscalable…
View article: Improving LLM Reasoning with Multi-Agent Tree-of-Thought Validator Agent
Improving LLM Reasoning with Multi-Agent Tree-of-Thought Validator Agent Open
Multi-agent strategies have emerged as a promising approach to enhance the reasoning abilities of Large Language Models (LLMs) by assigning specialized roles in the problem-solving process. Concurrently, Tree of Thoughts (ToT) methods have…
View article: Enhancing Event Reasoning in Large Language Models through Instruction Fine-Tuning with Semantic Causal Graphs
Enhancing Event Reasoning in Large Language Models through Instruction Fine-Tuning with Semantic Causal Graphs Open
Event detection and text reasoning have become critical applications across various domains. While LLMs have recently demonstrated impressive progress in reasoning abilities, they often struggle with event detection, particularly due to th…
View article: RASTeR: Robust, Agentic, and Structured Temporal Reasoning
RASTeR: Robust, Agentic, and Structured Temporal Reasoning Open
Temporal question answering (TQA) remains a challenge for large language models (LLMs), particularly when retrieved content may be irrelevant, outdated, or temporally inconsistent. This is especially critical in applications like clinical …
View article: Beyond Text-to-SQL for IoT Defense: A Comprehensive Framework for Querying and Classifying IoT Threats
Beyond Text-to-SQL for IoT Defense: A Comprehensive Framework for Querying and Classifying IoT Threats Open
Recognizing the promise of natural language interfaces to databases, prior studies have emphasized the development of text-to-SQL systems. While substantial progress has been made in this field, existing research has concentrated on genera…
View article: Improving Expert Radiology Report Summarization by Prompting Large Language Models with a Layperson Summary
Improving Expert Radiology Report Summarization by Prompting Large Language Models with a Layperson Summary Open
Radiology report summarization (RRS) is crucial for patient care, requiring concise "Impressions" from detailed "Findings." This paper introduces a novel prompting strategy to enhance RRS by first generating a layperson summary. This appro…
View article: Unmasking Database Vulnerabilities: Zero-Knowledge Schema Inference Attacks in Text-to-SQL Systems
Unmasking Database Vulnerabilities: Zero-Knowledge Schema Inference Attacks in Text-to-SQL Systems Open
Text-to-SQL systems empower users to interact with databases using natural language, automatically translating queries into executable SQL code. However, their reliance on database schema information for SQL generation exposes them to sign…
View article: Team UTSA-NLP at SemEval 2024 Task 5: Prompt Ensembling for Argument Reasoning in Civil Procedures with GPT4
Team UTSA-NLP at SemEval 2024 Task 5: Prompt Ensembling for Argument Reasoning in Civil Procedures with GPT4 Open
In this paper, we present our system for the SemEval Task 5, The Legal Argument Reasoning Task in Civil Procedure Challenge. Legal argument reasoning is an essential skill that all law students must master. Moreover, it is important to dev…
View article: Extracting Biomedical Entities from Noisy Audio Transcripts
Extracting Biomedical Entities from Noisy Audio Transcripts Open
Automatic Speech Recognition (ASR) technology is fundamental in transcribing spoken language into text, with considerable applications in the clinical realm, including streamlining medical transcription and integrating with Electronic Heal…
View article: BabyStories: Can Reinforcement Learning Teach Baby Language Models to Write Better Stories?
BabyStories: Can Reinforcement Learning Teach Baby Language Models to Write Better Stories? Open
Language models have seen significant growth in the size of their corpus, leading to notable performance improvements. Yet, there has been limited progress in developing models that handle smaller, more human-like datasets. As part of the …
View article: A marker-based neural network system for extracting social determinants of health
A marker-based neural network system for extracting social determinants of health Open
Objective The impact of social determinants of health (SDoH) on patients’ healthcare quality and the disparity is well known. Many SDoH items are not coded in structured forms in electronic health records. These items are often captured in…
View article: Towards Understanding the Generalization of Medical Text-to-SQL Models and Datasets
Towards Understanding the Generalization of Medical Text-to-SQL Models and Datasets Open
Electronic medical records (EMRs) are stored in relational databases. It can be challenging to access the required information if the user is unfamiliar with the database schema or general database fundamentals. Hence, researchers have exp…
View article: Bike Frames: Understanding the Implicit Portrayal of Cyclists in the News
Bike Frames: Understanding the Implicit Portrayal of Cyclists in the News Open
Increasing cycling for transportation or recreation can boost health and reduce the environmental impacts of vehicles. However, news agencies' ideologies and reporting styles often influence public perception of cycling. For example, if ne…
View article: BabyStories: Can Reinforcement Learning Teach Baby Language Models to Write Better Stories?
BabyStories: Can Reinforcement Learning Teach Baby Language Models to Write Better Stories? Open
Language models have seen significant growth in the size of their corpus, leading to notable performance improvements.Yet, there has been limited progress in developing models that handle smaller, more human-like datasets.As part of the Ba…
View article: UTSA-NLP at RadSum23: Multi-modal Retrieval-Based Chest X-Ray Report Summarization
UTSA-NLP at RadSum23: Multi-modal Retrieval-Based Chest X-Ray Report Summarization Open
Radiology report summarization aims to automatically provide concise summaries of radiology findings, reducing time and errors in manual summaries. However, current methods solely summarize the text, which overlooks critical details in the…
View article: A Marker-based Neural Network System for Extracting Social Determinants of Health
A Marker-based Neural Network System for Extracting Social Determinants of Health Open
Objective. The impact of social determinants of health (SDoH) on patients' healthcare quality and the disparity is well-known. Many SDoH items are not coded in structured forms in electronic health records. These items are often captured i…