Roberta Răileanu
YOU?
Author Swipe
View article: The Automated LLM Speedrunning Benchmark: Reproducing NanoGPT Improvements
The Automated LLM Speedrunning Benchmark: Reproducing NanoGPT Improvements Open
Rapid advancements in large language models (LLMs) have the potential to assist in scientific progress. A critical capability toward this endeavor is the ability to reproduce existing work. To evaluate the ability of AI agents to reproduce…
View article: Sparks of Science: Hypothesis Generation Using Structured Paper Data
Sparks of Science: Hypothesis Generation Using Structured Paper Data Open
Generating novel and creative scientific hypotheses is a cornerstone in achieving Artificial General Intelligence. Large language and reasoning models have the potential to aid in the systematic creation, selection, and validation of scien…
View article: MLGym: A New Framework and Benchmark for Advancing AI Research Agents
MLGym: A New Framework and Benchmark for Advancing AI Research Agents Open
We introduce Meta MLGym and MLGym-Bench, a new framework and benchmark for evaluating and developing LLM agents on AI research tasks. This is the first Gym environment for machine learning (ML) tasks, enabling research on reinforcement lea…
View article: MaestroMotif: Skill Design from Artificial Intelligence Feedback
MaestroMotif: Skill Design from Artificial Intelligence Feedback Open
Describing skills in natural language has the potential to provide an accessible way to inject human knowledge about decision-making into an AI system. We present MaestroMotif, a method for AI-assisted skill design, which yields high-perfo…
View article: Source2Synth: Synthetic Data Generation and Curation Grounded in Real Data Sources
Source2Synth: Synthetic Data Generation and Curation Grounded in Real Data Sources Open
Synthetic data generation has recently emerged as a promising approach for enhancing the capabilities of large language models (LLMs) without the need for expensive human annotations. However, existing methods often generate data that can …
View article: DreamCraft: Text-Guided Generation of Functional 3D Environments in Minecraft
DreamCraft: Text-Guided Generation of Functional 3D Environments in Minecraft Open
Procedural Content Generation (PCG) algorithms enable the automatic generation of complex and diverse artifacts. However, they don't provide high-level control over the generated content and typically require domain expertise. In contrast,…
View article: Teaching Large Language Models to Reason with Reinforcement Learning
Teaching Large Language Models to Reason with Reinforcement Learning Open
Reinforcement Learning from Human Feedback (\textbf{RLHF}) has emerged as a dominant approach for aligning LLM outputs with human preferences. Inspired by the success of RLHF, we study the performance of multiple algorithms that learn from…
View article: Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts
Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts Open
As large language models (LLMs) become increasingly prevalent across many real-world applications, understanding and enhancing their robustness to adversarial attacks is of paramount importance. Existing methods for identifying adversarial…
View article: TOOLVERIFIER: Generalization to New Tools via Self-Verification
TOOLVERIFIER: Generalization to New Tools via Self-Verification Open
Teaching language models to use tools is an important milestone towards building general assistants, but remains an open problem. While there has been significant progress on learning to use specific tools via fine-tuning, language models …
View article: The Generalization Gap in Offline Reinforcement Learning
The Generalization Gap in Offline Reinforcement Learning Open
Despite recent progress in offline learning, these methods are still trained and tested on the same environment. In this paper, we compare the generalization abilities of widely used online and offline learning methods such as online reinf…
View article: Generalization to New Sequential Decision Making Tasks with In-Context Learning
Generalization to New Sequential Decision Making Tasks with In-Context Learning Open
Training autonomous agents that can learn new tasks from only a handful of demonstrations is a long-standing problem in machine learning. Recently, transformers have been shown to learn new language or vision tasks without any weight updat…
View article: Understanding the Effects of RLHF on LLM Generalisation and Diversity
Understanding the Effects of RLHF on LLM Generalisation and Diversity Open
Large language models (LLMs) fine-tuned with reinforcement learning from human feedback (RLHF) have been used in some of the most widely deployed AI models to date, such as OpenAI's ChatGPT or Anthropic's Claude. While there has been signi…
View article: Motif: Intrinsic Motivation from Artificial Intelligence Feedback
Motif: Intrinsic Motivation from Artificial Intelligence Feedback Open
Exploring rich environments and evaluating one's actions without prior knowledge is immensely challenging. In this paper, we propose Motif, a general method to interface such prior knowledge from a Large Language Model (LLM) with an agent.…
View article: Chain-of-Verification Reduces Hallucination in Large Language Models
Chain-of-Verification Reduces Hallucination in Large Language Models Open
Generation of plausible yet incorrect factual information, termed hallucination, is an unsolved issue in large language models. We study the ability of language models to deliberate on the responses they give in order to correct their mist…
View article: Steering Language Generation: Harnessing Contrastive Expert Guidance and Negative Prompting for Coherent and Diverse Synthetic Data Generation
Steering Language Generation: Harnessing Contrastive Expert Guidance and Negative Prompting for Coherent and Diverse Synthetic Data Generation Open
Large Language Models (LLMs) hold immense potential to generate synthetic data of high quality and utility, which has numerous applications from downstream model training to practical data utilisation. However, contemporary models, despite…
View article: Challenges and Applications of Large Language Models
Challenges and Applications of Large Language Models Open
Large Language Models (LLMs) went from non-existent to ubiquitous in the machine learning discourse within a few years. Due to the fast pace of the field, it is difficult to identify the remaining challenges and already fruitful applicatio…
View article: Improving Language Plasticity via Pretraining with Active Forgetting
Improving Language Plasticity via Pretraining with Active Forgetting Open
Pretrained language models (PLMs) are today the primary model for natural language processing. Despite their impressive downstream performance, it can be difficult to apply PLMs to new languages, a barrier to making their capabilities univ…
View article: On the Importance of Exploration for Generalization in Reinforcement Learning
On the Importance of Exploration for Generalization in Reinforcement Learning Open
Existing approaches for improving generalization in deep reinforcement learning (RL) have mostly focused on representation learning, neglecting RL-specific aspects such as exploration. We hypothesize that the agent's exploration strategy p…
View article: A Study of Global and Episodic Bonuses for Exploration in Contextual MDPs
A Study of Global and Episodic Bonuses for Exploration in Contextual MDPs Open
Exploration in environments which differ across episodes has received increasing attention in recent years. Current methods use some combination of global novelty bonuses, computed using the agent's entire training experience, and \textit{…
View article: Hyperparameters in Reinforcement Learning and How To Tune Them
Hyperparameters in Reinforcement Learning and How To Tune Them Open
In order to improve reproducibility, deep reinforcement learning (RL) has been adopting better scientific practices such as standardized evaluation metrics and reporting. However, the process of hyperparameter optimization still varies wid…
View article: MAESTRO: Open-Ended Environment Design for Multi-Agent Reinforcement Learning
MAESTRO: Open-Ended Environment Design for Multi-Agent Reinforcement Learning Open
Open-ended learning methods that automatically generate a curriculum of increasingly challenging tasks serve as a promising avenue toward generally capable reinforcement learning agents. Existing methods adapt curricula independently over …
View article: Augmented Language Models: a Survey
Augmented Language Models: a Survey Open
This survey reviews works in which language models (LMs) are augmented with reasoning skills and the ability to use tools. The former is defined as decomposing a potentially complex task into simpler subtasks while the latter consists in c…
View article: Toolformer: Language Models Can Teach Themselves to Use Tools
Toolformer: Language Models Can Teach Themselves to Use Tools Open
Language models (LMs) exhibit remarkable abilities to solve new tasks from just a few examples or textual instructions, especially at scale. They also, paradoxically, struggle with basic functionality, such as arithmetic or factual lookup,…
View article: Building a Subspace of Policies for Scalable Continual Learning
Building a Subspace of Policies for Scalable Continual Learning Open
The ability to continuously acquire new knowledge and skills is crucial for autonomous agents. Existing methods are typically based on either fixed-size models that struggle to learn a large number of diverse behaviors, or growing-size mod…
View article: Dungeons and Data: A Large-Scale NetHack Dataset
Dungeons and Data: A Large-Scale NetHack Dataset Open
Recent breakthroughs in the development of agents to solve challenging sequential decision making problems such as Go, StarCraft, or DOTA, have relied on both simulated environments and large-scale datasets. However, progress on this resea…
View article: Exploration via Elliptical Episodic Bonuses
Exploration via Elliptical Episodic Bonuses Open
In recent years, a number of reinforcement learning (RL) methods have been proposed to explore complex environments which differ across episodes. In this work, we show that the effectiveness of these methods critically relies on a count-ba…
View article: Insights From the NeurIPS 2021 NetHack Challenge
Insights From the NeurIPS 2021 NetHack Challenge Open
In this report, we summarize the takeaways from the first NeurIPS 2021 NetHack Challenge. Participants were tasked with developing a program or agent that can win (i.e., 'ascend' in) the popular dungeon-crawler game of NetHack by interacti…
View article: Improving Intrinsic Exploration with Language Abstractions
Improving Intrinsic Exploration with Language Abstractions Open
Reinforcement learning (RL) agents are particularly hard to train when rewards are sparse. One common solution is to use intrinsic rewards to encourage agents to explore their environment. However, recent intrinsic exploration methods ofte…