Davis Liang
YOU?
Author Swipe
View article: From Feedback to Checklists: Grounded Evaluation of AI-Generated Clinical Notes
From Feedback to Checklists: Grounded Evaluation of AI-Generated Clinical Notes Open
AI-generated clinical notes are increasingly used in healthcare, but evaluating their quality remains a challenge due to high subjectivity and limited scalability of expert review. Existing automated metrics often fail to align with real-w…
View article: The Curious Language Model: Strategic Test-Time Information Acquisition
The Curious Language Model: Strategic Test-Time Information Acquisition Open
Decision-makers often possess insufficient information to render a confident decision. In these cases, the decision-maker can often undertake actions to acquire the necessary information about the problem at hand, e.g., by consulting knowl…
View article: RoAST: Robustifying Language Models via Adversarial Perturbation with Selective Training
RoAST: Robustifying Language Models via Adversarial Perturbation with Selective Training Open
Fine-tuning pre-trained language models (LMs) has become the de facto standard in many NLP tasks. Nevertheless, fine-tuned LMs are still prone to robustness issues, such as adversarial robustness and model calibration. Several perspectives…
View article: Co-training and Co-distillation for Quality Improvement and Compression of Language Models
Co-training and Co-distillation for Quality Improvement and Compression of Language Models Open
Knowledge Distillation (KD) compresses computationally expensive pre-trained language models (PLMs) by transferring their knowledge to smaller models, allowing their use in resource-constrained or real-time settings. However, most smaller …
View article: The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants
The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants Open
We present Belebele, a multiple-choice machine reading comprehension (MRC) dataset spanning 122 language variants. Significantly expanding the language coverage of natural language understanding (NLU) benchmarks, this dataset enables the e…
View article: A Study on Knowledge Distillation from Weak Teacher for Scaling Up Pre-trained Language Models
A Study on Knowledge Distillation from Weak Teacher for Scaling Up Pre-trained Language Models Open
Distillation from Weak Teacher (DWT) is a method of transferring knowledge from a smaller, weaker teacher model to a larger student model to improve its performance. Previous studies have shown that DWT can be effective in the vision domai…
View article: XLM-V: Overcoming the Vocabulary Bottleneck in Multilingual Masked Language Models
XLM-V: Overcoming the Vocabulary Bottleneck in Multilingual Masked Language Models Open
Large multilingual language models typically rely on a single vocabulary shared across 100+ languages. As these models have increased in parameter count and depth, vocabulary size has remained largely unchanged. This \textit{vocabulary bot…
View article: A Study on Knowledge Distillation from Weak Teacher for Scaling Up Pre-trained Language Models
A Study on Knowledge Distillation from Weak Teacher for Scaling Up Pre-trained Language Models Open
Distillation from Weak Teacher (DWT) is a method of transferring knowledge from a smaller, weaker teacher model to a larger student model to improve its performance. Previous studies have shown that DWT can be effective in the vision domai…
View article: Generating Hashtags for Short-form Videos with Guided Signals
Generating Hashtags for Short-form Videos with Guided Signals Open
Short-form video hashtag recommendation (SVHR) aims to recommend hashtags to content creators from videos and corresponding descriptions. Most prior studies regard SVHR as a classification or ranking problem and select hashtags from a set …
View article: RoAST: Robustifying Language Models via Adversarial Perturbation with Selective Training
RoAST: Robustifying Language Models via Adversarial Perturbation with Selective Training Open
Fine-tuning pre-trained language models (LMs) has become the de facto standard in many NLP tasks. Nevertheless, fine-tuned LMs are still prone to robustness issues, such as adversarial robustness and model calibration. Several perspectives…
View article: Co-training and Co-distillation for Quality Improvement and Compression of Language Models
Co-training and Co-distillation for Quality Improvement and Compression of Language Models Open
Knowledge Distillation (KD) compresses computationally expensive pre-trained language models (PLMs) by transferring their knowledge to smaller models, allowing their use in resource-constrained or real-time settings. However, most smaller …
View article: Query Rewriting for Effective Misinformation Discovery
Query Rewriting for Effective Misinformation Discovery Open
We propose a novel system to help fact-checkers formulate search queries for known misinformation claims and effectively search across multiple social media platforms. We introduce an adaptable rewriting strategy, where editing actions for…
View article: Attention-guided Generative Models for Extractive Question Answering
Attention-guided Generative Models for Extractive Question Answering Open
We propose a novel method for applying Transformer models to extractive question answering (QA) tasks. Recently, pretrained generative sequence-to-sequence (seq2seq) models have achieved great success in question answering. Contributing to…
View article: Multiplicative Position-aware Transformer Models for Language Understanding
Multiplicative Position-aware Transformer Models for Language Understanding Open
Transformer models, which leverage architectural improvements like self-attention, perform remarkably well on Natural Language Processing (NLP) tasks. The self-attention mechanism is position agnostic. In order to capture positional orderi…
View article: Decoding and Diversity in Machine Translation
Decoding and Diversity in Machine Translation Open
Neural Machine Translation (NMT) systems are typically evaluated using automated metrics that assess the agreement between generated translations and ground truth candidates. To improve systems with respect to these metrics, NLP researcher…
View article: Improve Transformer Models with Better Relative Position Embeddings
Improve Transformer Models with Better Relative Position Embeddings Open
Transformer architectures rely on explicit position encodings in order to preserve a notion of word order. In this paper, we argue that existing work does not fully utilize position information. For example, the initial proposal of a sinus…
View article: Embedding-based Zero-shot Retrieval through Query Generation
Embedding-based Zero-shot Retrieval through Query Generation Open
Passage retrieval addresses the problem of locating relevant passages, usually from a large corpus, given a query. In practice, lexical term-matching algorithms like BM25 are popular choices for retrieval owing to their efficiency. However…
View article: TRANS-BLSTM: Transformer with Bidirectional LSTM for Language Understanding
TRANS-BLSTM: Transformer with Bidirectional LSTM for Language Understanding Open
Bidirectional Encoder Representations from Transformers (BERT) has recently achieved state-of-the-art performance on a broad range of NLP tasks including sentence classification, machine translation, and question answering. The BERT model …
View article: Improve Transformer Models with Better Relative Position Embeddings
Improve Transformer Models with Better Relative Position Embeddings Open
The transformer model has demonstrated superior results on NLP tasks including machine translation and question answering. In this paper, we argue that the position information is not fully utilized in existing work. For example, the initi…
View article: Pseudolikelihood Reranking with Masked Language Models.
Pseudolikelihood Reranking with Masked Language Models. Open
We rerank with scores from pretrained masked language models like BERT to improve ASR and NMT performance. These log-pseudolikelihood scores (LPLs) can outperform large, autoregressive language models (GPT-2) in out-of-the-box scoring. RoB…
View article: Learning Noise-Invariant Representations for Robust Speech Recognition
Learning Noise-Invariant Representations for Robust Speech Recognition Open
Despite rapid advances in speech recognition, current models remain brittle to superficial perturbations to their inputs. Small amounts of noise can destroy the performance of an otherwise state-of-the-art model. To harden models against b…
View article: Deep Automated Multi-task Learning
Deep Automated Multi-task Learning Open
Multi-task learning (MTL) has recently contributed to learning better representations in service of various NLP tasks. MTL aims at improving the performance of a primary task, by jointly training on a secondary task. This paper introduces …
View article: Automated Multi-task Learning
Automated Multi-task Learning Open
Multi-task learning (MTL) has recently contributed to learning better representations in service of various natural language (NLP) tasks. MTL aims at improving the performance of a primary task by jointly training on a secondary task. This…