Arkil Patel
YOU?
Author Swipe
View article: AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories
AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories Open
Web agents enable users to perform tasks on web browsers through natural language interaction. Evaluating web agents trajectories is an important problem, since it helps us determine whether the agent successfully completed the tasks. Rule…
View article: How to Get Your LLM to Generate Challenging Problems for Evaluation
How to Get Your LLM to Generate Challenging Problems for Evaluation Open
The pace of evolution of Large Language Models (LLMs) necessitates new approaches for rigorous and comprehensive evaluation. Traditional human annotation is increasingly impracticable due to the complexities and costs involved in generatin…
View article: Investigating Adversarial Trigger Transfer in Large Language Models
Investigating Adversarial Trigger Transfer in Large Language Models Open
Recent work has developed optimization procedures to find token sequences, called adversarial triggers, which can elicit unsafe responses from aligned language models. These triggers are believed to be highly transferable, i.e., a trigger …
View article: Evaluating In-Context Learning of Libraries for Code Generation
Evaluating In-Context Learning of Libraries for Code Generation Open
Contemporary Large Language Models (LLMs) exhibit a high degree of code generation and comprehension capability. A particularly promising area is their ability to interpret code modules from unfamiliar libraries for solving user-instructed…
View article: MAGNIFICo: Evaluating the In-Context Learning Ability of Large Language Models to Generalize to Novel Interpretations
MAGNIFICo: Evaluating the In-Context Learning Ability of Large Language Models to Generalize to Novel Interpretations Open
Humans possess a remarkable ability to assign novel interpretations to linguistic expressions, enabling them to learn new words and understand community-specific connotations. However, Large Language Models (LLMs) have a knowledge cutoff a…
View article: Understanding In-Context Learning in Transformers and LLMs by Learning to Learn Discrete Functions
Understanding In-Context Learning in Transformers and LLMs by Learning to Learn Discrete Functions Open
In order to understand the in-context learning phenomenon, recent works have adopted a stylized experimental framework and demonstrated that Transformers can learn gradient-based learning algorithms for various classes of real-valued funct…
View article: Simplicity Bias in Transformers and their Ability to Learn Sparse Boolean Functions
Simplicity Bias in Transformers and their Ability to Learn Sparse Boolean Functions Open
Despite the widespread success of Transformers on NLP tasks, recent works have found that they struggle to model several formal languages when compared to recurrent models. This raises the question of why Transformers perform well in pract…
View article: MAGNIFICo: Evaluating the In-Context Learning Ability of Large Language Models to Generalize to Novel Interpretations
MAGNIFICo: Evaluating the In-Context Learning Ability of Large Language Models to Generalize to Novel Interpretations Open
Humans possess a remarkable ability to assign novel interpretations to linguistic expressions, enabling them to learn new words and understand community-specific connotations. However, Large Language Models (LLMs) have a knowledge cutoff a…
View article: Simplicity Bias in Transformers and their Ability to Learn Sparse Boolean Functions
Simplicity Bias in Transformers and their Ability to Learn Sparse Boolean Functions Open
Despite the widespread success of Transformers on NLP tasks, recent works have found that they struggle to model several formal languages when compared to recurrent models. This raises the question of why Transformers perform well in pract…
View article: When Can Transformers Ground and Compose: Insights from Compositional Generalization Benchmarks
When Can Transformers Ground and Compose: Insights from Compositional Generalization Benchmarks Open
Humans can reason compositionally whilst grounding language utterances to the real world. Recent benchmarks like ReaSCAN use navigation tasks grounded in a grid world to assess whether neural models exhibit similar capabilities. In this wo…
View article: Revisiting the Compositional Generalization Abilities of Neural Sequence Models
Revisiting the Compositional Generalization Abilities of Neural Sequence Models Open
Compositional generalization is a fundamental trait in humans, allowing us to effortlessly combine known phrases to form novel sentences. Recent works have claimed that standard seq-to-seq models severely lack the ability to compositionall…
View article: When Can Transformers Ground and Compose: Insights from Compositional Generalization Benchmarks
When Can Transformers Ground and Compose: Insights from Compositional Generalization Benchmarks Open
Humans can reason compositionally whilst grounding language utterances to the real world. Recent benchmarks like ReaSCAN (Wu et al., 2021) use navigation tasks grounded in a grid world to assess whether neural models exhibit similar capabi…
View article: Are NLP Models really able to Solve Simple Math Word Problems?
Are NLP Models really able to Solve Simple Math Word Problems? Open
The problem of designing NLP solvers for math word problems (MWP) has seen sustained research activity and steady gains in the test accuracy. Since existing solvers achieve high performance on the benchmark datasets for elementary level MW…
View article: On the Computational Power of Transformers and Its Implications in Sequence Modeling
On the Computational Power of Transformers and Its Implications in Sequence Modeling Open
Transformers are being used extensively across several sequence modeling tasks. Significant research effort has been devoted to experimentally probe the inner workings of Transformers. However, our conceptual and theoretical understanding …