Explanipedia

Real-Time Visual Feedback to Guide Benchmark Creation: A Human-and-Metric-in-the-Loop Workflow Open

Anjana Arunkumar, Swaroop Mishra, Bhavdeep Sachdeva, Chitta Baral, Chris Bryan · 2023

Recent research has shown that language models exploit `artifacts' in benchmarks to solve tasks, rather than truly learning them, leading to inflated model performance. In pursuit of creating better benchmarks, we propose VAIDA, a novel be…

Real-Time Visual Feedback to Guide Benchmark Creation: A Human-and-Metric-in-the-Loop Workflow Open

Anjana Arunkumar, Swaroop Mishra, Bhavdeep Sachdeva, Chitta Baral, Chris Bryan · 2023

Recent research has shown that language models exploit ‘artifacts’ in benchmarks to solve tasks, rather than truly learning them, leading to inflated model performance. In pursuit of creating better benchmarks, we propose VAIDA, a novel be…

Pretrained Transformers Do not Always Improve Robustness Open

Swaroop Mishra, Bhavdeep Sachdeva, Chitta Baral · 2022

Pretrained Transformers (PT) have been shown to improve Out of Distribution (OOD) robustness than traditional models such as Bag of Words (BOW), LSTMs, Convolutional Neural Networks (CNN) powered by Word2Vec and Glove embeddings. How does …

NumGLUE: A Suite of Fundamental yet Challenging Mathematical Reasoning Tasks Open

Swaroop Mishra, Arindam Mitra, Neeraj Varshney, Bhavdeep Sachdeva, Peter Clark , et al. · 2022

Given the ubiquitous nature of numbers in text, reasoning with numbers to perform simple calculations is an important skill of AI systems. While many datasets and models have been developed to this end, state-of-the-art AI systems are brit…

Generalized but not Robust? Comparing the Effects of Data Modification\n Methods on Out-of-Domain Generalization and Adversarial Robustness Open

Tejas Gokhale, Swaroop Mishra, Man Luo, Bhavdeep Sachdeva, Chitta Baral · 2022

Data modification, either via additional training datasets, data\naugmentation, debiasing, and dataset filtering, has been proposed as an\neffective solution for generalizing to out-of-domain (OOD) inputs, in both\nnatural language process…

NumGLUE: A Suite of Fundamental yet Challenging Mathematical Reasoning Tasks Open

Swaroop Mishra, Arindam Mitra, Neeraj Varshney, Bhavdeep Sachdeva, Peter Clark , et al. · 2022

Swaroop Mishra, Arindam Mitra, Neeraj Varshney, Bhavdeep Sachdeva, Peter Clark, Chitta Baral, Ashwin Kalyan. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2022.

Generalized but not Robust? Comparing the Effects of Data Modification Methods on Out-of-Domain Generalization and Adversarial Robustness Open

Tejas Gokhale, Swaroop Mishra, Meng‐Fei Luo, Bhavdeep Sachdeva, Chitta Baral · 2022

Data modification, either via additional training datasets, data augmentation, debiasing, and dataset filtering, has been proposed as an effective solution for generalizing to out-of-domain (OOD) inputs, in both natural language processing…

DQI: A Guide to Benchmark Evaluation Open

Swaroop Mishra, Anjana Arunkumar, Bhavdeep Sachdeva, Chris Bryan, Chitta Baral · 2020

A `state of the art' model A surpasses humans in a benchmark B, but fails on similar benchmarks C, D, and E. What does B have that the other benchmarks do not? Recent research provides the answer: spurious bias. However, developing A to so…

Towards Question Format Independent Numerical Reasoning: A Set of Prerequisite Tasks Open

Swaroop Mishra, Arindam Mitra, Neeraj Varshney, Bhavdeep Sachdeva, Chitta Baral · 2020

Numerical reasoning is often important to accurately understand the world. Recently, several format-specific datasets have been proposed, such as numerical reasoning in the settings of Natural Language Inference (NLI), Reading Comprehensio…

DQI: Measuring Data Quality in NLP Open

Swaroop Mishra, Anjana Arunkumar, Bhavdeep Sachdeva, Chris Bryan, Chitta Baral · 2020

Neural language models have achieved human level performance across several NLP datasets. However, recent studies have shown that these models are not truly learning the desired task; rather, their high performance is attributed to overfit…

Do We Need to Create Big Datasets to Learn a Task? Open

Swaroop Mishra, Bhavdeep Sachdeva · 2020

Deep Learning research has been largely accelerated by the development of huge datasets such as Imagenet. The general trend has been to create big datasets to make a deep neural network learn. A huge amount of resources is being spent in c…

Bhavdeep Sachdeva YOU? Author Swipe