Explanipedia

Persona-Driven Benchmarking for Generalizable and Human-Aware Artificial General Intelligence Open

Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran , et al. · 2025

This research paper, "Persona-Driven Benchmarking for Generalizable and Human-Aware Artificial General Intelligence," proposes a novel architectural solution to transition current Large Language Models (LLMs) from sophisticated pattern-mat…

Probing AI Safety with Source Code Open

Ujwal Narayan, Shreyas Chaudhari, Ashwin Kalyan, Tanmay Rajpurohit, Karthik Narasimhan , et al. · 2025

Large language models (LLMs) have become ubiquitous, interfacing with humans in numerous safety-critical applications. This necessitates improving capabilities, but importantly coupled with greater safety measures to align these models wit…

RLHF Deciphered: A Critical Analysis of Reinforcement Learning from Human Feedback for LLMs Open

Shreyas Chaudhari, Pranjal Aggarwal, Vishvak Murahari, Tanmay Rajpurohit, Ashwin Kalyan , et al. · 2025

A significant challenge in training large language models (LLMs) as effective assistants is aligning them with human preferences. Reinforcement learning from human feedback (RLHF) has emerged as a promising solution. However, our understan…

Agent Context Protocols Enhance Collective Inference Open

Devansh Bhardwaj, Arjun Beniwal, Shreyas Chaudhari, Ashwin Kalyan, Tanmay Rajpurohit , et al. · 2025

AI agents have become increasingly adept at complex tasks such as coding, reasoning, and multimodal understanding. However, building generalist systems requires moving beyond individual agents to collective inference -- a paradigm where mu…

Contextual Experience Replay for Self-Improvement of Language Agents Open

Yitao Liu, Chenglei Si, Karthik Narasimhan, Shunyu Yao · 2025

An Annotated Dataset of Errors in Premodern Greek and Baselines for Detecting Them Open

Creston Brooks, Johannes Haubold, Charlie Cowen-Breen, Jay White, Desmond DeVaul , et al. · 2025

PersonaGym: Evaluating Persona Agents and LLMs Open

Vinay Samuel, Henry Peng Zou, Yue Zhou, Shreyas Chaudhari, Ashwin Kalyan , et al. · 2025

LoRA Soups: Merging LoRAs for Practical Skill Composition Tasks Open

Akshara Prabhakar, Yuanzhi Li, Karthik Narasimhan, Sham M. Kakade, Eran Malach , et al. · 2024

Low-Rank Adaptation (LoRA) is a popular technique for parameter-efficient fine-tuning of Large Language Models (LLMs). We study how different LoRA modules can be merged to achieve skill composition -- testing the performance of the merged …

SWE-bench Multimodal: Do AI Systems Generalize to Visual Software Domains? Open

John Yang, Carlos Jimenez-Gomez, Alex Zhang, K. Lieret, Joyce Yang , et al. · 2024

Autonomous systems for software engineering are now capable of fixing bugs and developing features. These systems are commonly evaluated on SWE-bench (Jimenez et al., 2024a), which assesses their ability to solve software issues from GitHu…

EnIGMA: Interactive Tools Substantially Assist LM Agents in Finding Security Vulnerabilities Open

Talor Abramovich, Meet Udeshi, Minghao Shao, K. Lieret, Haoran Xi , et al. · 2024

Although language model (LM) agents have demonstrated increased performance in multiple domains, including coding and web-browsing, their success in cybersecurity has been limited. We present EnIGMA, an LM agent for autonomously solving Ca…

LLMs are Superior Feedback Providers: Bootstrapping Reasoning for Lie Detection with Self-Generated Feedback Open

Tanushree Banerjee, Richard L. Zhu, Runzhe Yang, Karthik Narasimhan · 2024

Large Language Models (LLMs) excel at generating human-like dialogues and comprehending text. However, understanding the subtleties of complex exchanges in language remains a challenge. We propose a bootstrapping framework that leverages s…

ShieldGemma: Generative AI Content Moderation Based on Gemma Open

Wenjun Zeng, Yuchi Liu, Ryan Mullins, Ludovic Peran, Joe Fernandez , et al. · 2024

We present ShieldGemma, a comprehensive suite of LLM-based safety content moderation models built upon Gemma2. These models provide robust, state-of-the-art predictions of safety risks across key harm types (sexually explicit, dangerous co…

PersonaGym: Evaluating Persona Agents and LLMs Open

Vinay Samuel, Henry Peng Zou, Yue Zhou, Shreyas Chaudhari, Ashwin Kalyan , et al. · 2024

Persona agents, which are LLM agents conditioned to act according to an assigned persona, enable contextually rich and user aligned interactions across domains like education and healthcare. However, evaluating how faithfully these agents …

$τ$-bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains Open

Shunyu Yao, Noah Shinn, Pedram Razavi, Karthik Narasimhan · 2024

Existing benchmarks do not test language agents on their interaction with human users or ability to follow domain-specific rules, both of which are vital for deploying them in real world applications. We propose $τ$-bench, a benchmark emul…

SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering Open

John Yang, Carlos Jimenez-Gomez, Alexander Wettig, K. Lieret, Shunyu Yao , et al. · 2024

Language model (LM) agents are increasingly being used to automate complicated tasks in digital environments. Just as humans benefit from powerful software applications, such as integrated development environments, for complex tasks like s…

Can Language Models Solve Olympiad Programming? Open

Quan Shi, Michael T. Tang, Karthik Narasimhan, Shunyu Yao · 2024

Computing olympiads contain some of the most challenging problems for humans, requiring complex algorithmic reasoning, puzzle solving, in addition to generating efficient code. However, it has been understudied as a domain to evaluate lang…

RLHF Deciphered: A Critical Analysis of Reinforcement Learning from Human Feedback for LLMs Open

Shreyas Chaudhari, Pranjal Aggarwal, Vishvak Murahari, Tanmay Rajpurohit, Ashwin Kalyan , et al. · 2024

State-of-the-art large language models (LLMs) have become indispensable tools for various tasks. However, training LLMs to serve as effective assistants for humans requires careful consideration. A promising approach is reinforcement learn…

Language-Guided World Models: A Model-Based Approach to AI Control Open

Alex Zhang, Khanh Nguyen, Jens Tuyls, Albert Y. Lin, Karthik Narasimhan · 2024

This paper introduces the concept of Language-Guided World Models (LWMs) -- probabilistic models that can simulate environments by reading texts. Agents equipped with these models provide humans with more extensive and efficient control, a…

GEO: Generative Engine Optimization Open

Pranjal Aggarwal, Vishvak Murahari, Tanmay Rajpurohit, Ashwin Kalyan, Karthik Narasimhan , et al. · 2023

The advent of large language models (LLMs) has ushered in a new paradigm of search engines that use generative models to gather and summarize information to answer user queries. This emerging technology, which we formalize under the unifie…

QualEval: Qualitative Evaluation for Model Improvement Open

Vishvak Murahari, Ameet Deshpande, Peter E. Clark, Tanmay Rajpurohit, Ashish Sabharwal , et al. · 2023

Quantitative evaluation metrics have traditionally been pivotal in gauging the advancements of artificial intelligence systems, including large language models (LLMs). However, these metrics have inherent limitations. Given the intricate n…

Progressively Efficient Learning Open

Ruijie Zheng, Khanh Nguyen, Hal Daumé, Furong Huang, Karthik Narasimhan · 2023

Assistant AI agents should be capable of rapidly acquiring novel skills and adapting to new user preferences. Traditional frameworks like imitation learning and reinforcement learning do not facilitate this capability because they support …

SWE-bench: Can Language Models Resolve Real-World GitHub Issues? Open

Carlos Jimenez-Gomez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei , et al. · 2023

Language models have outpaced our ability to evaluate them effectively, but for their future development it is essential to study the frontier of their capabilities. We find real-world software engineering to be a rich, sustainable, and ch…

FireAct: Toward Language Agent Fine-tuning Open

Baian Chen, Chang Shu, Ehsan Shareghi, Nigel Collier, Karthik Narasimhan , et al. · 2023

Recent efforts have augmented language models (LMs) with external tools or environments, leading to the development of language agents that can reason and act. However, most of these agents rely on few-shot prompting techniques with off-th…

Cognitive Architectures for Language Agents Open

Theodore R. Sumers, Shunyu Yao, Karthik Narasimhan, Thomas L. Griffiths · 2023

Recent efforts have augmented large language models (LLMs) with external resources (e.g., the Internet) or internal control flows (e.g., prompt chaining) for tasks requiring grounding or reasoning, leading to a new class of language agents…

Scaling Laws for Imitation Learning in Single-Agent Games Open

Jens Tuyls, Dhruv Madeka, Kari Torkkola, Dean P. Foster, Karthik Narasimhan , et al. · 2023

Imitation Learning (IL) is one of the most widely used methods in machine learning. Yet, many works find it is often unable to fully recover the underlying expert behavior, even in constrained environments like single-agent games. However,…

COLLIE: Systematic Construction of Constrained Text Generation Tasks Open

Shunyu Yao, Howard Chen, Austin W. Hanjie, Runzhe Yang, Karthik Narasimhan · 2023

Text generation under constraints have seen increasing interests in natural language processing, especially with the rapidly improving capabilities of large language models. However, existing benchmarks for constrained generation usually f…

InstructEval: Systematic Evaluation of Instruction Selection Methods Open

Anirudh Ajith, Chris Pan, Mengzhou Xia, Ameet Deshpande, Karthik Narasimhan · 2023

In-context learning (ICL) performs tasks by prompting a large language model (LLM) using an instruction and a small set of annotated examples called demonstrations. Recent work has shown that precise details of the inputs used in the ICL p…

InterCode: Standardizing and Benchmarking Interactive Coding with Execution Feedback Open

John Yang, Akshara Prabhakar, Karthik Narasimhan, Shunyu Yao · 2023

Humans write code in a fundamentally interactive manner and rely on constant execution feedback to correct errors, resolve ambiguities, and decompose tasks. While LLMs have recently exhibited promising coding capabilities, current coding b…

PruMUX: Augmenting Data Multiplexing with Model Compression Open

Yushan Su, Vishvak Murahari, Karthik Narasimhan, Kai Li · 2023

As language models increase in size by the day, methods for efficient inference are critical to leveraging their capabilities for various applications. Prior work has investigated techniques like model pruning, knowledge distillation, and …

Referral Augmentation for Zero-Shot Information Retrieval Open

Michael T. Tang, Shunyu Yao, John Yang, Karthik Narasimhan · 2023

We propose Referral-Augmented Retrieval (RAR), a simple technique that concatenates document indices with referrals, i.e. text from other documents that cite or link to the given document, to provide significant performance gains for zero-…

Karthik Narasimhan YOU? Author Swipe