Explanipedia

AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories Open

Xing Han Lù, Amirhossein Kazemnejad, Nicholas Meade, Arkil Patel, Dongchan Shin , et al. · 2025

Web agents enable users to perform tasks on web browsers through natural language interaction. Evaluating web agents trajectories is an important problem, since it helps us determine whether the agent successfully completed the tasks. Rule…

How to Get Your LLM to Generate Challenging Problems for Evaluation Open

Arkil Patel, Siva Reddy, Dzmitry Bahdanau · 2025

The pace of evolution of Large Language Models (LLMs) necessitates new approaches for rigorous and comprehensive evaluation. Traditional human annotation is increasingly impracticable due to the complexities and costs involved in generatin…

Investigating Adversarial Trigger Transfer in Large Language Models Open

Nicholas Meade, Arkil Patel, Siva Reddy · 2024

Computer science

Recent work has developed optimization procedures to find token sequences, called adversarial triggers, which can elicit unsafe responses from aligned language models. These triggers are believed to be highly transferable, i.e., a trigger …

Evaluating In-Context Learning of Libraries for Code Generation Open

Arkil Patel, Siva Reddy, Dzmitry Bahdanau, Pradeep Dasigi · 2023

Computer science Sociology Geography

Contemporary Large Language Models (LLMs) exhibit a high degree of code generation and comprehension capability. A particularly promising area is their ability to interpret code modules from unfamiliar libraries for solving user-instructed…

MAGNIFICo: Evaluating the In-Context Learning Ability of Large Language Models to Generalize to Novel Interpretations Open

Arkil Patel, Satwik Bhattamishra, Siva Reddy, Dzmitry Bahdanau · 2023

Computer science Psychology Political science

Humans possess a remarkable ability to assign novel interpretations to linguistic expressions, enabling them to learn new words and understand community-specific connotations. However, Large Language Models (LLMs) have a knowledge cutoff a…

Understanding In-Context Learning in Transformers and LLMs by Learning to Learn Discrete Functions Open

Satwik Bhattamishra, Arkil Patel, Phil Blunsom, Varun Kanade · 2023

Computer science Engineering Economics

In order to understand the in-context learning phenomenon, recent works have adopted a stylized experimental framework and demonstrated that Transformers can learn gradient-based learning algorithms for various classes of real-valued funct…

Simplicity Bias in Transformers and their Ability to Learn Sparse Boolean Functions Open

Satwik Bhattamishra, Arkil Patel, Varun Kanade, Phil Blunsom · 2023

Computer science Mathematics Engineering

Despite the widespread success of Transformers on NLP tasks, recent works have found that they struggle to model several formal languages when compared to recurrent models. This raises the question of why Transformers perform well in pract…

MAGNIFICo: Evaluating the In-Context Learning Ability of Large Language Models to Generalize to Novel Interpretations Open

Arkil Patel, Satwik Bhattamishra, Siva Reddy, Dzmitry Bahdanau · 2023

Computer science Psychology History

Humans possess a remarkable ability to assign novel interpretations to linguistic expressions, enabling them to learn new words and understand community-specific connotations. However, Large Language Models (LLMs) have a knowledge cutoff a…

Simplicity Bias in Transformers and their Ability to Learn Sparse Boolean Functions Open

Satwik Bhattamishra, Arkil Patel, Varun Kanade, Phil Blunsom · 2022

Computer science Mathematics Engineering

Despite the widespread success of Transformers on NLP tasks, recent works have found that they struggle to model several formal languages when compared to recurrent models. This raises the question of why Transformers perform well in pract…

When Can Transformers Ground and Compose: Insights from Compositional Generalization Benchmarks Open

Ankur Sikarwar, Arkil Patel, Navin Goyal · 2022

Computer science Engineering Mathematics

Humans can reason compositionally whilst grounding language utterances to the real world. Recent benchmarks like ReaSCAN use navigation tasks grounded in a grid world to assess whether neural models exhibit similar capabilities. In this wo…

Revisiting the Compositional Generalization Abilities of Neural Sequence Models Open

Arkil Patel, Satwik Bhattamishra, Phil Blunsom, Navin Goyal · 2022

Computer science Mathematics Physics

Compositional generalization is a fundamental trait in humans, allowing us to effortlessly combine known phrases to form novel sentences. Recent works have claimed that standard seq-to-seq models severely lack the ability to compositionall…

When Can Transformers Ground and Compose: Insights from Compositional Generalization Benchmarks Open

Ankur Sikarwar, Arkil Patel, Navin Goyal · 2022

Computer science Mathematics Engineering

Humans can reason compositionally whilst grounding language utterances to the real world. Recent benchmarks like ReaSCAN (Wu et al., 2021) use navigation tasks grounded in a grid world to assess whether neural models exhibit similar capabi…

Are NLP Models really able to Solve Simple Math Word Problems? Open

Arkil Patel, Satwik Bhattamishra, Navin Goyal · 2021

Computer science Mathematics Biology

The problem of designing NLP solvers for math word problems (MWP) has seen sustained research activity and steady gains in the test accuracy. Since existing solvers achieve high performance on the benchmark datasets for elementary level MW…

On the Computational Power of Transformers and Its Implications in Sequence Modeling Open

Satwik Bhattamishra, Arkil Patel, Navin Goyal · 2020

Computer science Engineering

Transformers are being used extensively across several sequence modeling tasks. Significant research effort has been devoted to experimentally probe the inner workings of Transformers. However, our conceptual and theoretical understanding …

Arkil Patel YOU? Author Swipe