Samson Tan
YOU?
Author Swipe
View article: Learning to Generate Answers with Citations via Factual Consistency Models
Learning to Generate Answers with Citations via Factual Consistency Models Open
Large Language Models (LLMs) frequently hallucinate, impeding their reliability in mission-critical situations. One approach to address this issue is to provide citations to relevant sources alongside generated content, enhancing the verif…
View article: Lessons from the Trenches on Reproducible Evaluation of Language Models
Lessons from the Trenches on Reproducible Evaluation of Language Models Open
Effective evaluation of language models remains an open challenge in NLP. Researchers and engineers face methodological issues such as the sensitivity of models to evaluation setup, difficulty of proper comparisons across methods, and the …
View article: Extreme Miscalibration and the Illusion of Adversarial Robustness
Extreme Miscalibration and the Illusion of Adversarial Robustness Open
Deep learning-based Natural Language Processing (NLP) models are vulnerable to adversarial attacks, where small perturbations can cause a model to misclassify. Adversarial Training (AT) is often used to increase model robustness. However, …
View article: Automatic Feature Fairness in Recommendation via Adversaries
Automatic Feature Fairness in Recommendation via Adversaries Open
Fairness is a widely discussed topic in recommender systems, but its practical implementation faces challenges in defining sensitive features while maintaining recommendation accuracy. We propose feature fairness as the foundation to achie…
View article: Large Language Models of Code Fail at Completing Code with Potential Bugs
Large Language Models of Code Fail at Completing Code with Potential Bugs Open
Large language models of code (Code-LLMs) have recently brought tremendous advances to code completion, a fundamental feature of programming assistance and code intelligence. However, most existing works ignore the possible presence of bug…
View article: NL-Augmenter 🦎 → 🐍 A Framework for Task-Sensitive Natural Language Augmentation
NL-Augmenter 🦎 → 🐍 A Framework for Task-Sensitive Natural Language Augmentation Open
Data augmentation is an important method for evaluating the robustness of and enhancing the diversity of training data for natural language processing (NLP) models. In this paper, we present NL-Augmenter, a new participatory Python-based n…
View article: TraVLR: Now You See It, Now You Don’t! A Bimodal Dataset for Evaluating Visio-Linguistic Reasoning
TraVLR: Now You See It, Now You Don’t! A Bimodal Dataset for Evaluating Visio-Linguistic Reasoning Open
Numerous visio-linguistic (V+L) representation learning methods have been developed, yet existing datasets do not adequately evaluate the extent to which they represent visual and linguistic concepts in a unified space. We propose several …
View article: ReCode: Robustness Evaluation of Code Generation Models
ReCode: Robustness Evaluation of Code Generation Models Open
Shiqi Wang, Zheng Li, Haifeng Qian, Chenghao Yang, Zijian Wang, Mingyue Shang, Varun Kumar, Samson Tan, Baishakhi Ray, Parminder Bhatia, Ramesh Nallapati, Murali Krishna Ramanathan, Dan Roth, Bing Xiang. Proceedings of the 61st Annual Meet…
View article: ReCode: Robustness Evaluation of Code Generation Models
ReCode: Robustness Evaluation of Code Generation Models Open
Code generation models have achieved impressive performance. However, they tend to be brittle as slight edits to a prompt could lead to very different generations; these robustness properties, critical for user experience when deployed in …
View article: BotSIM: An End-to-End Bot Simulation Framework for Commercial Task-Oriented Dialog Systems
BotSIM: An End-to-End Bot Simulation Framework for Commercial Task-Oriented Dialog Systems Open
We present BotSIM, a data-efficient end-to-end Bot SIMulation toolkit for commercial text-based task-oriented dialog (TOD) systems. BotSIM consists of three major components: 1) a Generator that can infer semantic-level dialog acts and ent…
View article: Whodunit? Learning to Contrast for Authorship Attribution
Whodunit? Learning to Contrast for Authorship Attribution Open
Authorship attribution is the task of identifying the author of a given text. The key is finding representations that can differentiate between authors. Existing approaches typically use manually designed features that capture a dataset's …
View article: Interpreting the Robustness of Neural NLP Models to Textual Perturbations
Interpreting the Robustness of Neural NLP Models to Textual Perturbations Open
10.18653/v1/2022.findings-acl.315
View article: BotSIM: An End-to-End Bot Simulation Framework for Commercial Task-Oriented Dialog Systems
BotSIM: An End-to-End Bot Simulation Framework for Commercial Task-Oriented Dialog Systems Open
We present BotSIM, a data-efficient end-to-end Bot SIMulation framework for commercial task-oriented dialog (TOD) systems. BotSIM consists of three major components: 1) a Generator that can infer semantic-level dialog acts and entities fro…
View article: Between words and characters: A Brief History of Open-Vocabulary Modeling and Tokenization in NLP
Between words and characters: A Brief History of Open-Vocabulary Modeling and Tokenization in NLP Open
What are the units of text that we want to model? From bytes to multi-word expressions, text can be analyzed and generated at many granularities. Until recently, most natural language processing (NLP) models operated over words, treating t…
View article: TraVLR: Now You See It, Now You Don't! A Bimodal Dataset for Evaluating Visio-Linguistic Reasoning
TraVLR: Now You See It, Now You Don't! A Bimodal Dataset for Evaluating Visio-Linguistic Reasoning Open
Numerous visio-linguistic (V+L) representation learning methods have been developed, yet existing datasets do not adequately evaluate the extent to which they represent visual and linguistic concepts in a unified space. We propose several …
View article: Interpreting the Robustness of Neural NLP Models to Textual Perturbations
Interpreting the Robustness of Neural NLP Models to Textual Perturbations Open
Modern Natural Language Processing (NLP) models are known to be sensitive to input perturbations and their performance can decrease when applied to real-world, noisy data. However, it is still unclear why models are less robust to some per…
View article: Causally Estimating the Sensitivity of Neural NLP Models to Spurious Features
Causally Estimating the Sensitivity of Neural NLP Models to Spurious Features Open
Recent work finds modern natural language processing (NLP) models relying on spurious features for prediction. Mitigating such effects is thus important. Despite this need, there is no quantitative measure to evaluate or compare the effect…
View article: Code-Mixing on Sesame Street: Dawn of the Adversarial Polyglots
Code-Mixing on Sesame Street: Dawn of the Adversarial Polyglots Open
Multilingual models have demonstrated impressive cross-lingual transfer performance. However, test sets like XNLI are monolingual at the example level. In multilingual communities, it is common for polyglots to code-mix when conversing wit…
View article: Code-Mixing on Sesame Street: Dawn of the Adversarial Polyglots
Code-Mixing on Sesame Street: Dawn of the Adversarial Polyglots Open
Multilingual models have demonstrated impressive cross-lingual transfer performance. However, test sets like XNLI are monolingual at the example level. In multilingual communities, it is common for polyglots to code-mix when conversing wit…
View article: Code-Mixing on Sesame Street: Dawn of the Adversarial Polyglots
Code-Mixing on Sesame Street: Dawn of the Adversarial Polyglots Open
Multilingual models have demonstrated impressive cross-lingual transfer performance. However, test sets like XNLI are monolingual at the example level. In multilingual communities, it is common for polyglots to code-mix when conversing wit…
View article: Robustness Gym: Unifying the NLP Evaluation Landscape
Robustness Gym: Unifying the NLP Evaluation Landscape Open
Despite impressive performance on standard benchmarks, deep neural networks are often brittle when deployed in real-world systems. Consequently, recent research has focused on testing the robustness of such models, resulting in a diverse s…
View article: Mind Your Inflections! Improving NLP for Non-Standard English with Base-Inflection Encoding
Mind Your Inflections! Improving NLP for Non-Standard English with Base-Inflection Encoding Open
Morphological inflection is a process of word formation where base words are modified to express different grammatical categories such as tense, case, voice, person, or number. World Englishes, such as Colloquial Singapore English (CSE) an…
View article: Mind Your Inflections! Improving NLP for Non-Standard Englishes with Base-Inflection Encoding
Mind Your Inflections! Improving NLP for Non-Standard Englishes with Base-Inflection Encoding Open
Inflectional variation is a common feature of World Englishes such as Colloquial Singapore English and African American Vernacular English. Although comprehension by human readers is usually unimpaired by non-standard inflections, current …
View article: It’s Morphin’ Time! Combating Linguistic Discrimination with Inflectional Perturbations
It’s Morphin’ Time! Combating Linguistic Discrimination with Inflectional Perturbations Open
Training on only perfect Standard English corpora predisposes pre-trained\nneural networks to discriminate against minorities from non-standard linguistic\nbackgrounds (e.g., African American Vernacular English, Colloquial Singapore\nEngli…
View article: Mind Your Inflections! Improving NLP for Non-Standard Englishes with Base-Inflection Encoding
Mind Your Inflections! Improving NLP for Non-Standard Englishes with Base-Inflection Encoding Open
10.18653/v1/2020.emnlp-main.455