Explanipedia

Where to Begin: Efficient Pretraining via Subnetwork Selection and Distillation Open

Arjun Krishnakumar, Rhea Sanjay Sukthanker, Hannan Javed Mahadik, Gabriela Kadlecová, Vladyslav Moroshan , et al. · 2025

Small Language models (SLMs) offer an efficient and accessible alternative to Large Language Models (LLMs), delivering strong performance while using far fewer resources. We introduce a simple and effective framework for pretraining SLMs t…

Improving LLM-based Global Optimization with Search Space Partitioning Open

A.M.L. Schwanke, Lyubomir Ivanov, David Salinas, Fábio Ferreira, Aaron Klein , et al. · 2025

Large Language Models (LLMs) have recently emerged as effective surrogate models and candidate generators within global optimization frameworks for expensive blackbox functions. Despite promising results, LLM-based methods often struggle i…

Hyperband-based Bayesian Optimization for Black-box Prompt Selection Open

Lennart Schneider, Martin Wistuba, Aaron Klein, Jacek Gołębiowski, Giovanni Zappella , et al. · 2024

Computer science

Optimal prompt selection is crucial for maximizing large language model (LLM) performance on downstream tasks, especially in black-box settings where models are only accessible via APIs. Black-box prompt selection is challenging due to pot…

Hyperparameter Optimization in Machine Learning Open

Luca Franceschi, Michele Donini, Valerio Perrone, Aaron Klein, Cédric Archambeau , et al. · 2024

Computer science

Hyperparameters are configuration variables controlling the behavior of machine learning algorithms. They are ubiquitous in machine learning and artificial intelligence and the choice of their values determines the effectiveness of systems…

Compressing Large Language Models with Automated Sub-Network Search Open

Rhea Sanjay Sukthanker, Benedikt Staffler, Frank Hutter, Aaron Klein · 2024

Computer science Materials science Geography

Large Language Models (LLMs) demonstrate exceptional reasoning abilities, enabling strong generalization across diverse tasks such as commonsense reasoning and instruction following. However, as LLMs scale, inference costs become increasin…

Structural Pruning of Pre-trained Language Models via Neural Architecture Search Open

Aaron Klein, Jacek Gołębiowski, Xingchen Ma, Valerio Perrone, Cédric Archambeau · 2024

Computer science Geography Biology

Pre-trained language models (PLM), for example BERT or RoBERTa, mark the state-of-the-art for natural language understanding task when fine-tuned on labeled data. However, their large size poses challenges in deploying them for inference i…

Obeying the Order: Introducing Ordered Transfer Hyperparameter Optimisation Open

Sigrid Passano Hellan, Huibin Shen, François-Xavier Aubet, David Salinas, Aaron Klein · 2023

Computer science Economics

We introduce ordered transfer hyperparameter optimisation (OTHPO), a version of transfer learning for hyperparameter optimisation (HPO) where the tasks follow a sequential order. Unlike for state-of-the-art transfer HPO, the assumption is …

Optimizing Hyperparameters with Conformal Quantile Regression Open

David Salinas, Jacek Gołębiowski, Aaron Klein, Matthias Seeger, Cédric Archambeau · 2023

Computer science Mathematics Physics

Many state-of-the-art hyperparameter optimization (HPO) algorithms rely on model-based optimizers that learn surrogate models of the target function to guide the search. Gaussian processes are the de facto surrogate model due to their abil…

HPOBench: A Collection of Reproducible Multi-Fidelity Benchmark Problems for HPO Open

Katharina Eggensperger, Philipp Müller, Neeratyoy Mallik, Matthias Feurer, René Sass , et al. · 2021

Computer science Engineering Business

To achieve peak predictive performance, hyperparameter optimization (HPO) is a crucial component of machine learning and its applications. Over the last years, the number of efficient algorithms and tools for HPO grew substantially. At the…

Online Optimization of Stimulation Speed in an Auditory Brain-Computer Interface under Time Constraints Open

Jan Sosulski, David Hübner, Aaron Klein, Michael Tangermann · 2021

Computer science Psychology

The decoding of brain signals recorded via, e.g., an electroencephalogram, using machine learning is key to brain-computer interfaces (BCIs). Stimulation parameters or other experimental settings of the BCI protocol typically are chosen ac…

Overfitting in Bayesian Optimization: an empirical study and early-stopping solution Open

Anastasia Makarova, Huibin Shen, Valerio Perrone, Aaron Klein, Jean Baptiste Faddoul , et al. · 2021

Computer science Mathematics Engineering

Tuning machine learning models with Bayesian optimization (BO) is a successful strategy to find good hyperparameters. BO defines an iterative procedure where a cross-validated metric is evaluated on promising hyperparameters. In practice, …

Automatic Termination for Hyperparameter Optimization Open

Anastasia Makarova, Huibin Shen, Valerio Perrone, Aaron Klein, Jean Baptiste Faddoul , et al. · 2021

Computer science Mathematics Materials science

Bayesian optimization (BO) is a widely popular approach for the hyperparameter optimization (HPO) in machine learning. At its core, BO iteratively evaluates promising configurations until a user-defined budget, such as wall-clock time or n…

Hyperparameter Transfer Learning with Adaptive Complexity Open

Samuel Horváth, Aaron Klein, Peter Richtárik, Cédric Archambeau · 2021

Computer science Political science Economics

Bayesian optimization (BO) is a sample efficient approach to automatically tune the hyperparameters of machine learning models. In practice, one frequently has to solve similar hyperparameter tuning problems sequentially. For example, one …

BORE: Bayesian Optimization by Density-Ratio Estimation Open

Louis C. Tiao, Aaron Klein, Matthias Seeger, Edwin V. Bonilla, Cédric Archambeau , et al. · 2021

Computer science Mathematics Biology

Bayesian optimization (BO) is among the most effective and widely-used blackbox optimization methods. BO proposes solutions according to an explore-exploit trade-off criterion encoded in an acquisition function, many of which are computed …

Model-based Asynchronous Hyperparameter and Neural Architecture Search Open

Aaron Klein, Louis C. Tiao, Thibaut Lienart, Cédric Archambeau, Matthias Seeger · 2020

Computer science Physics Art

We introduce a model-based asynchronous multi-fidelity method for hyperparameter and neural architecture search that combines the strengths of asynchronous Hyperband and Gaussian process-based Bayesian optimization. At the heart of our met…

Efficient bayesian hyperparameter optimization Open

Aaron Klein · 2020

Computer science Mathematics

Automated machine learning emerged as a new research field inside of machine learning that tries to progressively automate different steps of common machine learning pipelines which are traditionally executed by humans. One of its core tas…

Probabilistic Rollouts for Learning Curve Extrapolation Across Hyperparameter Settings Open

Matilde Gargiani, Aaron Klein, Stefan Falkner, Frank Hutter · 2019

Computer science Physics Mathematics

We propose probabilistic models that can extrapolate learning curves of iterative machine learning algorithms, such as stochastic gradient descent for training deep networks, based on training data with variable-length learning curves. We …

Meta-Surrogate Benchmarking for Hyperparameter Optimization Open

Aaron Klein, Zhenwen Dai, Frank Hutter, Neil D. Lawrence, Javier González · 2019

Computer science Engineering Business

Despite the recent progress in hyperparameter optimization (HPO), available benchmarks that resemble real-world scenarios consist of a few and very large problem instances that are expensive to solve. This blocks researchers and practition…

Tabular Benchmarks for Joint Architecture and Hyperparameter Optimization Open

Aaron Klein, Frank Hutter · 2019

Computer science Mathematics Chemistry

Due to the high computational demands executing a rigorous comparison between hyperparameter optimization (HPO) methods is often cumbersome. The goal of this paper is to facilitate a better empirical evaluation of HPO methods by providing …

Tabular Benchmarks for Joint Architecture and Hyperparameter\n Optimization Open

Aaron Klein, Frank Hutter · 2019

Computer science Mathematics Chemistry

Due to the high computational demands executing a rigorous comparison between\nhyperparameter optimization (HPO) methods is often cumbersome. The goal of this\npaper is to facilitate a better empirical evaluation of HPO methods by\nprovidi…

NAS-Bench-101: Towards Reproducible Neural Architecture Search Open

Chris Ying, Aaron Klein, Eric Christiansen, Esteban Real, Kevin Murphy , et al. · 2019

Computer science Engineering Philosophy

Recent advances in neural architecture search (NAS) demand tremendous computational resources, which makes it difficult to reproduce experiments and imposes a barrier-to-entry to researchers without access to large-scale computation. We ai…

Towards Automatically-Tuned Deep Neural Networks Open

Hector Mendoza, Aaron Klein, Matthias Feurer, Jost Tobias Springenberg, Matthias Urban , et al. · 2019

Computer science Biology

Recent advances in AutoML have led to automated tools that can compete with machine learning experts on supervised learning tasks. In this work, we present two versions of Auto-Net, which provide automatically-tuned deep neural networks wi…

Meta-Surrogate Benchmarking for Hyperparameter Optimization Open

Aaron Klein, Zhenwen Dai, Frank Hutter, Neil D. Lawrence, Javier González · 2019

Computer science Engineering Business

Despite the recent progress in hyperparameter optimization (HPO), available benchmarks that resemble real-world scenarios consist of a few and very large problem instances that are expensive to solve. This blocks researchers and practition…

Auto-sklearn: Efficient and Robust Automated Machine Learning Open

Matthias Feurer, Aaron Klein, Katharina Eggensperger, Jost Tobias Springenberg, Manuel Blum , et al. · 2019

Computer science Philosophy

The success of machine learning in a broad range of applications has led to an ever-growing demand for machine learning systems that can be used off the shelf by non-experts. To be effective in practice, such systems need to automatically …

Towards Automated Deep Learning: Efficient Joint Neural Architecture and Hyperparameter Search Open

Arber Zela, Aaron Klein, Stefan Falkner, Frank Hutter · 2018

Computer science Engineering Art

While existing work on neural architecture search (NAS) tunes hyperparameters in a separate post-processing step, we demonstrate that architectural choices and other hyperparameter settings interact in a way that can render this separation…

BOHB: Robust and Efficient Hyperparameter Optimization at Scale Open

Stefan Falkner, Aaron Klein, Frank Hutter · 2018

Computer science Mathematics Engineering

Modern deep learning methods are very sensitive to many hyperparameters, and, due to the long training times of state-of-the-art models, vanilla Bayesian hyperparameter optimization is typically computationally infeasible. On the other han…

Uncertainty Estimates for Optical Flow with Multi-Hypotheses Networks Open

Eddy Ilg, Özgün Çiçek, Silvio Galesso, Aaron Klein, Osama Makansi , et al. · 2018

Computer science Mathematics Engineering

Recent work has shown that optical flow estimation can be formulated as an end-to-end supervised learning problem, which yields estimates with a superior accuracy-runtime tradeoff compared to alternative methodology. In this paper, we make…

Uncertainty Estimates and Multi-Hypotheses Networks for Optical Flow Open

Eddy Ilg, Özgün Çiçek, Silvio Galesso, Aaron Klein, Osama Makansi , et al. · 2018

Computer science Mathematics Engineering

Optical flow estimation can be formulated as an end-to-end supervised learning problem, which yields estimates with a superior accuracy-runtime tradeoff compared to alternative methodology. In this paper, we make such networks estimate the…

Fast Bayesian hyperparameter optimization on large datasets Open

Aaron Klein, Stefan Falkner, Simon Bartels, Philipp Hennig, Frank Hutter · 2017

Computer science

Bayesian optimization has become a successful tool for optimizing the hyperparameters of machine learning algorithms, such as support vector machines or deep neural networks. Despite its success, for large datasets, training and validating…

The Sacred Infrastructure for Computational Research Open

Klaus Greff, Aaron Klein, Martin Chovanec, Frank Hutter, Jürgen Schmidhuber · 2017

Computer science Economics

We present a toolchain for computational research consisting of Sacred and two supporting tools. Sacred is an open source Python framework which aims to provide basic infrastructure for running computational experiments independent of the …

Aaron Klein YOU? Author Swipe