Explanipedia

The Automated LLM Speedrunning Benchmark: Reproducing NanoGPT Improvements Open

Bingchen Zhao, Minqi Jiang, Xian Li, Roberta Raileanu, Michael Shvartsman , et al. · 2025

Rapid advancements in large language models (LLMs) have the potential to assist in scientific progress. A critical capability toward this endeavor is the ability to reproduce existing work. To evaluate the ability of AI agents to reproduce…

Sparks of Science: Hypothesis Generation Using Structured Paper Data Open

Tirthankar Ghosal, Roberta Răileanu, Thang Bui, Kevin Schawinski, Ioana Ciucă · 2025

Generating novel and creative scientific hypotheses is a cornerstone in achieving Artificial General Intelligence. Large language and reasoning models have the potential to aid in the systematic creation, selection, and validation of scien…

MLGym: A New Framework and Benchmark for Advancing AI Research Agents Open

Deepak Nathani, Lovish Madaan, Nicholas Roberts, Nikolay Bashlykov, Ajay Menon , et al. · 2025

Computer science Geography

We introduce Meta MLGym and MLGym-Bench, a new framework and benchmark for evaluating and developing LLM agents on AI research tasks. This is the first Gym environment for machine learning (ML) tasks, enabling research on reinforcement lea…

MaestroMotif: Skill Design from Artificial Intelligence Feedback Open

Martin Klissarov, Mikael Henaff, Roberta Raileanu, Shagun Sodhani, Pascal Vincent , et al. · 2024

Computer science

Describing skills in natural language has the potential to provide an accessible way to inject human knowledge about decision-making into an AI system. We present MaestroMotif, a method for AI-assisted skill design, which yields high-perfo…

Source2Synth: Synthetic Data Generation and Curation Grounded in Real Data Sources Open

Alisia Lupidi, Carlos Gemmell, Nicola Cancedda, Jane Dwivedi-Yu, Jason Weston , et al. · 2024

Computer science

Synthetic data generation has recently emerged as a promising approach for enhancing the capabilities of large language models (LLMs) without the need for expensive human annotations. However, existing methods often generate data that can …

DreamCraft: Text-Guided Generation of Functional 3D Environments in Minecraft Open

Sam Earle, Filippos Kokkinos, Yuhe Nie, Julian Togelius, Roberta Raileanu · 2024

Computer science

Procedural Content Generation (PCG) algorithms enable the automatic generation of complex and diverse artifacts. However, they don't provide high-level control over the generated content and typically require domain expertise. In contrast,…

Teaching Large Language Models to Reason with Reinforcement Learning Open

Alex Havrilla, Yuqing Du, Sharath Chandra Raparthy, Christoforos Nalmpantis, Jane Dwivedi-Yu , et al. · 2024

Computer science Psychology Philosophy

Reinforcement Learning from Human Feedback (\textbf{RLHF}) has emerged as a dominant approach for aligning LLM outputs with human preferences. Inspired by the success of RLHF, we study the performance of multiple algorithms that learn from…

Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts Open

Mikayel Samvelyan, Sharath Chandra Raparthy, Andrei Lupu, Eric Hambro, Aram Markosyan , et al. · 2024

Computer science Physics

As large language models (LLMs) become increasingly prevalent across many real-world applications, understanding and enhancing their robustness to adversarial attacks is of paramount importance. Existing methods for identifying adversarial…

TOOLVERIFIER: Generalization to New Tools via Self-Verification Open

Dheeraj Mekala, Jason Weston, Jack Lanchantin, Roberta Raileanu, María Lomelí , et al. · 2024

Computer science Mathematics

Teaching language models to use tools is an important milestone towards building general assistants, but remains an open problem. While there has been significant progress on learning to use specific tools via fine-tuning, language models …

The Generalization Gap in Offline Reinforcement Learning Open

Ishita Mediratta, Qingfei You, Minqi Jiang, Roberta Raileanu · 2023

Computer science Mathematics Geography

Despite recent progress in offline learning, these methods are still trained and tested on the same environment. In this paper, we compare the generalization abilities of widely used online and offline learning methods such as online reinf…

Generalization to New Sequential Decision Making Tasks with In-Context Learning Open

Sharath Chandra Raparthy, Eric Hambro, Robert Kirk, Mikael Henaff, Roberta Raileanu · 2023

Computer science Mathematics Biology

Training autonomous agents that can learn new tasks from only a handful of demonstrations is a long-standing problem in machine learning. Recently, transformers have been shown to learn new language or vision tasks without any weight updat…

Understanding the Effects of RLHF on LLM Generalisation and Diversity Open

Robert Kirk, Ishita Mediratta, Christoforos Nalmpantis, Jelena Luketina, Eric Hambro , et al. · 2023

Sociology

Large language models (LLMs) fine-tuned with reinforcement learning from human feedback (RLHF) have been used in some of the most widely deployed AI models to date, such as OpenAI's ChatGPT or Anthropic's Claude. While there has been signi…

Motif: Intrinsic Motivation from Artificial Intelligence Feedback Open

Martin Klissarov, Pierluca D’Oro, Shagun Sodhani, Roberta Raileanu, Pierre‐Luc Bacon , et al. · 2023

Computer science Physics

Exploring rich environments and evaluating one's actions without prior knowledge is immensely challenging. In this paper, we propose Motif, a general method to interface such prior knowledge from a Large Language Model (LLM) with an agent.…

Chain-of-Verification Reduces Hallucination in Large Language Models Open

Shehzaad Dhuliawala, Mojtaba Komeili, Jing Xu, Roberta Raileanu, Xian Li , et al. · 2023

Computer science History Philosophy

Generation of plausible yet incorrect factual information, termed hallucination, is an unsolved issue in large language models. We study the ability of language models to deliberate on the responses they give in order to correct their mist…

Steering Language Generation: Harnessing Contrastive Expert Guidance and Negative Prompting for Coherent and Diverse Synthetic Data Generation Open

Charles O’Neill, Yuan-Sen Ting, Ioana Ciucă, Roberta Raileanu, Jack Miller , et al. · 2023

Computer science Sociology

Large Language Models (LLMs) hold immense potential to generate synthetic data of high quality and utility, which has numerous applications from downstream model training to practical data utilisation. However, contemporary models, despite…

Challenges and Applications of Large Language Models Open

Jean Kaddour, Joshua Harris, Maximilian Mozes, Herbie Bradley, Roberta Raileanu , et al. · 2023

Computer science Engineering Geography

Large Language Models (LLMs) went from non-existent to ubiquitous in the machine learning discourse within a few years. Due to the fast pace of the field, it is difficult to identify the remaining challenges and already fruitful applicatio…

Improving Language Plasticity via Pretraining with Active Forgetting Open

Yihong Chen, Kelly Marchisio, Roberta Raileanu, David Ifeoluwa Adelani, Pontus Stenetor , et al. · 2023

Computer science Psychology Economics

Pretrained language models (PLMs) are today the primary model for natural language processing. Despite their impressive downstream performance, it can be difficult to apply PLMs to new languages, a barrier to making their capabilities univ…

On the Importance of Exploration for Generalization in Reinforcement Learning Open

Yiding Jiang, J. Zico Kolter, Roberta Raileanu · 2023

Computer science Mathematics Political science

Existing approaches for improving generalization in deep reinforcement learning (RL) have mostly focused on representation learning, neglecting RL-specific aspects such as exploration. We hypothesize that the agent's exploration strategy p…

A Study of Global and Episodic Bonuses for Exploration in Contextual MDPs Open

Mikael Henaff, Minqi Jiang, Roberta Raileanu · 2023

Computer science Psychology Economics

Exploration in environments which differ across episodes has received increasing attention in recent years. Current methods use some combination of global novelty bonuses, computed using the agent's entire training experience, and \textit{…

Hyperparameters in Reinforcement Learning and How To Tune Them Open

Theresa Eimer, Marius Lindauer, Roberta Raileanu · 2023

Computer science Business

In order to improve reproducibility, deep reinforcement learning (RL) has been adopting better scientific practices such as standardized evaluation metrics and reporting. However, the process of hyperparameter optimization still varies wid…

MAESTRO: Open-Ended Environment Design for Multi-Agent Reinforcement Learning Open

Mikayel Samvelyan, Akbir Khan, Michael J. Dennis, Minqi Jiang, Jack Parker-Holder , et al. · 2023

Computer science Psychology Mathematics

Open-ended learning methods that automatically generate a curriculum of increasingly challenging tasks serve as a promising avenue toward generally capable reinforcement learning agents. Existing methods adapt curricula independently over …

Augmented Language Models: a Survey Open

Grégoire Mialon, Roberto Dessì, María Lomelí, Christoforos Nalmpantis, Ram Pasunuru , et al. · 2023

Computer science Biology

This survey reviews works in which language models (LMs) are augmented with reasoning skills and the ability to use tools. The former is defined as decomposing a potentially complex task into simpler subtasks while the latter consists in c…

Toolformer: Language Models Can Teach Themselves to Use Tools Open

Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, María Lomelí , et al. · 2023

Computer science Philosophy Materials science

Language models (LMs) exhibit remarkable abilities to solve new tasks from just a few examples or textual instructions, especially at scale. They also, paradoxically, struggle with basic functionality, such as arithmetic or factual lookup,…

Building a Subspace of Policies for Scalable Continual Learning Open

Jean-Baptiste Gaya, Thang Doan, Lucas Caccia, Laure Soulier, Ludovic Denoyer , et al. · 2022

Computer science Mathematics Philosophy

The ability to continuously acquire new knowledge and skills is crucial for autonomous agents. Existing methods are typically based on either fixed-size models that struggle to learn a large number of diverse behaviors, or growing-size mod…

Dungeons and Data: A Large-Scale NetHack Dataset Open

Eric Hambro, Roberta Raileanu, Danielle Rothermel, Vegard Mella, Tim Rocktäschel , et al. · 2022

Computer science Materials science Physics

Recent breakthroughs in the development of agents to solve challenging sequential decision making problems such as Go, StarCraft, or DOTA, have relied on both simulated environments and large-scale datasets. However, progress on this resea…

Exploration via Elliptical Episodic Bonuses Open

Mikael Henaff, Roberta Raileanu, Minqi Jiang, Tim Rocktäschel · 2022

Computer science Mathematics Engineering

In recent years, a number of reinforcement learning (RL) methods have been proposed to explore complex environments which differ across episodes. In this work, we show that the effectiveness of these methods critically relies on a count-ba…

Insights From the NeurIPS 2021 NetHack Challenge Open

Eric Hambro, Sharada P. Mohanty, Dmitrii Babaev, Minwoo Byeon, Dipam Chakraborty , et al. · 2022

Computer science Geography

In this report, we summarize the takeaways from the first NeurIPS 2021 NetHack Challenge. Participants were tasked with developing a program or agent that can win (i.e., 'ascend' in) the popular dungeon-crawler game of NetHack by interacti…

Improving Intrinsic Exploration with Language Abstractions Open

Jesse Mu, Victor Zhong, Roberta Raileanu, Minqi Jiang, Noah D. Goodman , et al. · 2022

Computer science Psychology Political science

Reinforcement learning (RL) agents are particularly hard to train when rewards are sparse. One common solution is to use intrinsic rewards to encourage agents to explore their environment. However, recent intrinsic exploration methods ofte…

Roberta Răileanu YOU? Author Swipe