Jonas Hübotter
YOU?
Author Swipe
View article: Learning on the Job: Test-Time Curricula for Targeted Reinforcement Learning
Learning on the Job: Test-Time Curricula for Targeted Reinforcement Learning Open
Humans are good at learning on the job: We learn how to solve the tasks we face as we go along. Can a model do the same? We propose an agent that assembles a task-specific curriculum, called test-time curriculum (TTC-RL), and applies reinf…
View article: Maximizing Prefix-Confidence at Test-Time Efficiently Improves Mathematical Reasoning
Maximizing Prefix-Confidence at Test-Time Efficiently Improves Mathematical Reasoning Open
Recent work has shown that language models can self-improve by maximizing their own confidence in their predictions, without relying on external verifiers or reward signals. In this work, we study the test-time scaling of language models f…
View article: Test-time Offline Reinforcement Learning on Goal-related Experience
Test-time Offline Reinforcement Learning on Goal-related Experience Open
Foundation models compress a large amount of information in a single, large neural network, which can then be queried for individual tasks. There are strong parallels between this widespread framework and offline goal-conditioned reinforce…
View article: DISCOVER: Automated Curricula for Sparse-Reward Reinforcement Learning
DISCOVER: Automated Curricula for Sparse-Reward Reinforcement Learning Open
Sparse-reward reinforcement learning (RL) can model a wide range of highly complex tasks. Solving sparse-reward tasks is RL's core premise, requiring efficient exploration coupled with long-horizon credit assignment, and overcoming these c…
View article: Probabilistic Artificial Intelligence
Probabilistic Artificial Intelligence Open
Artificial intelligence commonly refers to the science and engineering of artificial systems that can carry out tasks generally associated with requiring aspects of human intelligence, such as playing games, translating languages, and driv…
View article: LITE: Efficiently Estimating Gaussian Probability of Maximality
LITE: Efficiently Estimating Gaussian Probability of Maximality Open
We consider the problem of computing the probability of maximality (PoM) of a Gaussian random vector, i.e., the probability for each dimension to be maximal. This is a key challenge in applications ranging from Bayesian optimization to rei…
View article: Local Mixtures of Experts: Essentially Free Test-Time Training via Model Merging
Local Mixtures of Experts: Essentially Free Test-Time Training via Model Merging Open
Mixture of expert (MoE) models are a promising approach to increasing model capacity without increasing inference cost, and are core components of many state-of-the-art language models. However, current MoE models typically use only few ex…
View article: Active Fine-Tuning of Multi-Task Policies
Active Fine-Tuning of Multi-Task Policies Open
Pre-trained generalist policies are rapidly gaining relevance in robot learning due to their promise of fast adaptation to novel, in-domain tasks. This adaptation often relies on collecting new demonstrations for a specific task of interes…
View article: Efficiently Learning at Test-Time: Active Fine-Tuning of LLMs
Efficiently Learning at Test-Time: Active Fine-Tuning of LLMs Open
Recent efforts in fine-tuning language models often rely on automatic data selection, commonly using Nearest Neighbors retrieval from large datasets. However, we theoretically show that this approach tends to select redundant data, limitin…
View article: Active Fine-Tuning of Multi-Task Policies
Active Fine-Tuning of Multi-Task Policies Open
Pre-trained generalist policies are rapidly gaining relevance in robot learning due to their promise of fast adaptation to novel, in-domain tasks. This adaptation often relies on collecting new demonstrations for a specific task of interes…
View article: Transductive Active Learning: Theory and Applications
Transductive Active Learning: Theory and Applications Open
We study a generalization of classical active learning to real-world settings with concrete prediction targets where sampling is restricted to an accessible region of the domain, while prediction targets may lie outside this region. We ana…
View article: Active Few-Shot Fine-Tuning
Active Few-Shot Fine-Tuning Open
We study the question: How can we select the right data for fine-tuning to a specific task? We call this data selection problem active fine-tuning and show that it is an instance of transductive active learning, a novel generalization of c…
View article: Efficient Exploration in Continuous-time Model-based Reinforcement Learning
Efficient Exploration in Continuous-time Model-based Reinforcement Learning Open
Reinforcement learning algorithms typically consider discrete-time dynamics, even though the underlying systems are often continuous in time. In this paper, we introduce a model-based reinforcement learning algorithm that represents contin…
View article: Tuning Legged Locomotion Controllers via Safe Bayesian Optimization
Tuning Legged Locomotion Controllers via Safe Bayesian Optimization Open
This paper presents a data-driven strategy to streamline the deployment of model-based controllers in legged robotic hardware platforms. Our approach leverages a model-free safe learning algorithm to automate the tuning of control gains, a…
View article: Implementation of Algorithms for Right-Sizing Data Centers
Implementation of Algorithms for Right-Sizing Data Centers Open
The energy consumption of data centers assumes a significant fraction of the world's overall energy consumption. Most data centers are statically provisioned, leading to a very low average utilization of servers. In this work, we survey un…