Chieh-Yang Huang
YOU?
Author Swipe
View article: Using Contextually Aligned Online Reviews to Measure LLMs' Performance Disparities Across Language Varieties
Using Contextually Aligned Online Reviews to Measure LLMs' Performance Disparities Across Language Varieties Open
A language can have different varieties. These varieties can affect the performance of natural language processing (NLP) models, including large language models (LLMs), which are often trained on data from widely spoken varieties. This pap…
View article: Do Large Multimodal Models Solve Caption Generation for Scientific Figures? Lessons Learned from SciCap Challenge 2023
Do Large Multimodal Models Solve Caption Generation for Scientific Figures? Lessons Learned from SciCap Challenge 2023 Open
Since the SciCap datasets launch in 2021, the research community has made significant progress in generating captions for scientific figures in scholarly articles. In 2023, the first SciCap Challenge took place, inviting global teams to us…
View article: Multi-LLM Collaborative Caption Generation in Scientific Documents
Multi-LLM Collaborative Caption Generation in Scientific Documents Open
Scientific figure captioning is a complex task that requires generating contextually appropriate descriptions of visual content. However, existing methods often fall short by utilizing incomplete information, treating the task solely as ei…
View article: Using Contextually Aligned Online Reviews to Measure LLMs’ Performance Disparities Across Language Varieties
Using Contextually Aligned Online Reviews to Measure LLMs’ Performance Disparities Across Language Varieties Open
View article: Generating Educational Materials with Different Levels of Readability using LLMs
Generating Educational Materials with Different Levels of Readability using LLMs Open
This study introduces the leveled-text generation task, aiming to rewrite educational materials to specific readability levels while preserving meaning. We assess the capability of GPT-3.5, LLaMA-2 70B, and Mixtral 8x7B, to generate conten…
View article: If in a Crowdsourced Data Annotation Pipeline, a GPT-4
If in a Crowdsourced Data Annotation Pipeline, a GPT-4 Open
Recent studies indicated GPT-4 outperforms online crowd workers in data\nlabeling accuracy, notably workers from Amazon Mechanical Turk (MTurk).\nHowever, these studies were criticized for deviating from standard\ncrowdsourcing practices a…
View article: How Does Conversation Length Impact User’s Satisfaction? A Case Study of Length-Controlled Conversations with LLM-Powered Chatbots
How Does Conversation Length Impact User’s Satisfaction? A Case Study of Length-Controlled Conversations with LLM-Powered Chatbots Open
Users can discuss a wide range of topics with large language models (LLMs), but they do not always prefer solving problems or getting information through lengthy conversations. This raises an intriguing HCI question: How does instructing L…
View article: SciCapenter: Supporting Caption Composition for Scientific Figures with Machine-Generated Captions and Ratings
SciCapenter: Supporting Caption Composition for Scientific Figures with Machine-Generated Captions and Ratings Open
Crafting effective captions for figures is important. Readers heavily depend\non these captions to grasp the figure's message. However, despite a\nwell-developed set of AI technologies for figures and captions, these have\nrarely been test…
View article: How Does Conversation Length Impact User's Satisfaction? A Case Study of Length-Controlled Conversations with LLM-Powered Chatbots
How Does Conversation Length Impact User's Satisfaction? A Case Study of Length-Controlled Conversations with LLM-Powered Chatbots Open
Users can discuss a wide range of topics with large language models (LLMs), but they do not always prefer solving problems or getting information through lengthy conversations. This raises an intriguing HCI question: How does instructing L…
View article: Inspo: Writing with Crowds Alongside AI
Inspo: Writing with Crowds Alongside AI Open
The use of artificial intelligence (AI) to support creative writing has bloomed in recent years. However, it is less well understood how AI compares to on-demand human support. We explored how writers interact with both AI and crowd worker…
View article: GPT-4 as an Effective Zero-Shot Evaluator for Scientific Figure Captions
GPT-4 as an Effective Zero-Shot Evaluator for Scientific Figure Captions Open
There is growing interest in systems that generate captions for scientific figures. However, assessing these systems output poses a significant challenge. Human evaluation requires academic expertise and is costly, while automatic evaluati…
View article: Good Data, Large Data, or No Data? Comparing Three Approaches in Developing Research Aspect Classifiers for Biomedical Papers
Good Data, Large Data, or No Data? Comparing Three Approaches in Developing Research Aspect Classifiers for Biomedical Papers Open
The rapid growth of scientific publications, particularly during the COVID-19 pandemic, emphasizes the need for tools to help researchers efficiently comprehend the latest advancements. One essential part of understanding scientific litera…
View article: ConvXAI: Delivering Heterogeneous AI Explanations via Conversations to Support Human-AI Scientific Writing
ConvXAI: Delivering Heterogeneous AI Explanations via Conversations to Support Human-AI Scientific Writing Open
Despite a surge collection of XAI methods, users still struggle to obtain required AI explanations. Previous research suggests chatbots as dynamic solutions, but the effective design of conversational XAI agents for practical human needs r…
View article: What Types of Questions Require Conversation to Answer? A Case Study of AskReddit Questions
What Types of Questions Require Conversation to Answer? A Case Study of AskReddit Questions Open
The proliferation of automated conversational systems such as chatbots, spoken-dialogue systems, and smart speakers, has significantly impacted modern digital life. However, these systems are primarily designed to provide answers to well-d…
View article: Summaries as Captions: Generating Figure Captions for Scientific Documents with Automated Text Summarization
Summaries as Captions: Generating Figure Captions for Scientific Documents with Automated Text Summarization Open
Good figure captions help paper readers understand complex scientific figures. Unfortunately, even published papers often have poorly written captions. Automatic caption generation could aid paper writers by providing good starting caption…
View article: Conveying the Predicted Future to Users: A Case Study of Story Plot Prediction
Conveying the Predicted Future to Users: A Case Study of Story Plot Prediction Open
Creative writing is hard: Novelists struggle with writer's block daily. While automatic story generation has advanced recently, it is treated as a "toy task" for advancing artificial intelligence rather than helping people. In this paper, …
View article: Good Data, Large Data, or No Data? Comparing Three Approaches in Developing Research Aspect Classifiers for Biomedical Papers
Good Data, Large Data, or No Data? Comparing Three Approaches in Developing Research Aspect Classifiers for Biomedical Papers Open
The rapid growth of scientific publications, particularly during the COVID-19 pandemic, emphasizes the need for tools to help researchers efficiently comprehend the latest advancements. One essential part of understanding scientific litera…
View article: GPT-4 as an Effective Zero-Shot Evaluator for Scientific Figure Captions
GPT-4 as an Effective Zero-Shot Evaluator for Scientific Figure Captions Open
There is growing interest in systems that generate captions for scientific figures. However, assessing these systems’ output poses a significant challenge. Human evaluation requires academic expertise and is costly, while automatic evaluat…
View article: Summaries as Captions: Generating Figure Captions for Scientific Documents with Automated Text Summarization
Summaries as Captions: Generating Figure Captions for Scientific Documents with Automated Text Summarization Open
Chieh-Yang Huang, Ting-Yao Hsu, Ryan Rossi, Ani Nenkova, Sungchul Kim, Gromit Yeuk-Yin Chan, Eunyee Koh, C Lee Giles, Ting-Hao Huang. Proceedings of the 16th International Natural Language Generation Conference. 2023.
View article: Too Slow to Be Useful? On Incorporating Humans in the Loop of Smart Speakers
Too Slow to Be Useful? On Incorporating Humans in the Loop of Smart Speakers Open
Real-time crowd-powered systems, such as Chorus/Evorus, VizWiz, and Apparition, have shown how incorporating humans into automated systems could supplement where the automatic solutions fall short. However, one unspoken bottleneck of apply…
View article: Extracting Salient Facts from Company Reviews with Scarce Labels
Extracting Salient Facts from Company Reviews with Scarce Labels Open
In this paper, we propose the task of extracting salient facts from online company reviews. Salient facts present unique and distinctive information about a company, which helps the user in deciding whether to apply to the company. We form…
View article: Distilling Salient Reviews with Zero Labels
Distilling Salient Reviews with Zero Labels Open
Many people read online reviews to learn about real-world entities of their interest. However, majority of reviews only describes general experiences and opinions of the customers, and may not reveal facts that are specific to the entity b…
View article: Semantic Frame Forecast
Semantic Frame Forecast Open
This paper introduces semantic frame forecast, a task that predicts the semantic frames that will occur in the next 10, 100, or even 1,000 sentences in a running story. Prior work focused on predicting the immediate future of a story, such…
View article: Semantic Frame Forecast
Semantic Frame Forecast Open
This paper introduces Semantic Frame Forecast, a task that predicts the semantic frames that will occur in the next 10, 100, or even 1,000 sentences in a running story. Prior work focused on predicting the immediate future of a story, such…
View article: Assessing the Helpfulness of Learning Materials with Inference-Based Learner-Like Agent
Assessing the Helpfulness of Learning Materials with Inference-Based Learner-Like Agent Open
Many English-as-a-second language learners have trouble using near-synonym words (e.g., small vs.little; briefly vs.shortly) correctly, and often look for example sentences to learn how two nearly synonymous terms differ. Prior work uses h…
View article: TEST_POSITIVE at W-NUT 2020 Shared Task-3: Joint Event Multi-task Learning for Slot Filling in Noisy Text
TEST_POSITIVE at W-NUT 2020 Shared Task-3: Joint Event Multi-task Learning for Slot Filling in Noisy Text Open
The competition of extracting COVID-19 events from Twitter is to develop systems that can automatically extract related events from tweets. The built system should identify different pre-defined slots for each event, in order to answer imp…
View article: CODA-19: Using a Non-Expert Crowd to Annotate Research Aspects on 10,000+ Abstracts in the COVID-19 Open Research Dataset
CODA-19: Using a Non-Expert Crowd to Annotate Research Aspects on 10,000+ Abstracts in the COVID-19 Open Research Dataset Open
This paper introduces CODA-19, a human-annotated dataset that codes the Background, Purpose, Method, Finding/Contribution, and Other sections of 10,966 English abstracts in the COVID-19 Open Research Dataset. CODA-19 was created by 248 cro…
View article: CODA-19: Using a Non-Expert Crowd to Annotate Research Aspects on 10,000+ Abstracts in the COVID-19 Open Research Dataset
CODA-19: Using a Non-Expert Crowd to Annotate Research Aspects on 10,000+ Abstracts in the COVID-19 Open Research Dataset Open
This paper introduces CODA-19, a human-annotated dataset that codes the Background, Purpose, Method, Finding/Contribution, and Other sections of 10,966 English abstracts in the COVID-19 Open Research Dataset. CODA-19 was created by 248 cro…
View article: Assessing the Helpfulness of Learning Materials with Inference-Based Learner-Like Agent
Assessing the Helpfulness of Learning Materials with Inference-Based Learner-Like Agent Open
Many English-as-a-second language learners have trouble using near-synonym words (e.g., small vs.little; briefly vs.shortly) correctly, and often look for example sentences to learn how two nearly synonymous terms differ. Prior work uses h…
View article: TEST_POSITIVE at W-NUT 2020 Shared Task-3: Cross-task modeling
TEST_POSITIVE at W-NUT 2020 Shared Task-3: Cross-task modeling Open
The competition of extracting COVID-19 events from Twitter is to develop systems that can automatically extract related events from tweets. The built system should identify different pre-defined slots for each event, in order to answer imp…