Bhavdeep Sachdeva
YOU?
Author Swipe
View article: Real-Time Visual Feedback to Guide Benchmark Creation: A Human-and-Metric-in-the-Loop Workflow
Real-Time Visual Feedback to Guide Benchmark Creation: A Human-and-Metric-in-the-Loop Workflow Open
Recent research has shown that language models exploit `artifacts' in benchmarks to solve tasks, rather than truly learning them, leading to inflated model performance. In pursuit of creating better benchmarks, we propose VAIDA, a novel be…
View article: Real-Time Visual Feedback to Guide Benchmark Creation: A Human-and-Metric-in-the-Loop Workflow
Real-Time Visual Feedback to Guide Benchmark Creation: A Human-and-Metric-in-the-Loop Workflow Open
Recent research has shown that language models exploit ‘artifacts’ in benchmarks to solve tasks, rather than truly learning them, leading to inflated model performance. In pursuit of creating better benchmarks, we propose VAIDA, a novel be…
View article: Pretrained Transformers Do not Always Improve Robustness
Pretrained Transformers Do not Always Improve Robustness Open
Pretrained Transformers (PT) have been shown to improve Out of Distribution (OOD) robustness than traditional models such as Bag of Words (BOW), LSTMs, Convolutional Neural Networks (CNN) powered by Word2Vec and Glove embeddings. How does …
View article: NumGLUE: A Suite of Fundamental yet Challenging Mathematical Reasoning Tasks
NumGLUE: A Suite of Fundamental yet Challenging Mathematical Reasoning Tasks Open
Given the ubiquitous nature of numbers in text, reasoning with numbers to perform simple calculations is an important skill of AI systems. While many datasets and models have been developed to this end, state-of-the-art AI systems are brit…
View article: Generalized but not Robust? Comparing the Effects of Data Modification\n Methods on Out-of-Domain Generalization and Adversarial Robustness
Generalized but not Robust? Comparing the Effects of Data Modification\n Methods on Out-of-Domain Generalization and Adversarial Robustness Open
Data modification, either via additional training datasets, data\naugmentation, debiasing, and dataset filtering, has been proposed as an\neffective solution for generalizing to out-of-domain (OOD) inputs, in both\nnatural language process…
View article: NumGLUE: A Suite of Fundamental yet Challenging Mathematical Reasoning Tasks
NumGLUE: A Suite of Fundamental yet Challenging Mathematical Reasoning Tasks Open
Swaroop Mishra, Arindam Mitra, Neeraj Varshney, Bhavdeep Sachdeva, Peter Clark, Chitta Baral, Ashwin Kalyan. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2022.
View article: Generalized but not Robust? Comparing the Effects of Data Modification Methods on Out-of-Domain Generalization and Adversarial Robustness
Generalized but not Robust? Comparing the Effects of Data Modification Methods on Out-of-Domain Generalization and Adversarial Robustness Open
Data modification, either via additional training datasets, data augmentation, debiasing, and dataset filtering, has been proposed as an effective solution for generalizing to out-of-domain (OOD) inputs, in both natural language processing…
View article: DQI: A Guide to Benchmark Evaluation
DQI: A Guide to Benchmark Evaluation Open
A `state of the art' model A surpasses humans in a benchmark B, but fails on similar benchmarks C, D, and E. What does B have that the other benchmarks do not? Recent research provides the answer: spurious bias. However, developing A to so…
View article: Towards Question Format Independent Numerical Reasoning: A Set of Prerequisite Tasks
Towards Question Format Independent Numerical Reasoning: A Set of Prerequisite Tasks Open
Numerical reasoning is often important to accurately understand the world. Recently, several format-specific datasets have been proposed, such as numerical reasoning in the settings of Natural Language Inference (NLI), Reading Comprehensio…
View article: DQI: Measuring Data Quality in NLP
DQI: Measuring Data Quality in NLP Open
Neural language models have achieved human level performance across several NLP datasets. However, recent studies have shown that these models are not truly learning the desired task; rather, their high performance is attributed to overfit…
View article: Do We Need to Create Big Datasets to Learn a Task?
Do We Need to Create Big Datasets to Learn a Task? Open
Deep Learning research has been largely accelerated by the development of huge datasets such as Imagenet. The general trend has been to create big datasets to make a deep neural network learn. A huge amount of resources is being spent in c…