Aaron Klein
YOU?
Author Swipe
View article: Where to Begin: Efficient Pretraining via Subnetwork Selection and Distillation
Where to Begin: Efficient Pretraining via Subnetwork Selection and Distillation Open
Small Language models (SLMs) offer an efficient and accessible alternative to Large Language Models (LLMs), delivering strong performance while using far fewer resources. We introduce a simple and effective framework for pretraining SLMs t…
View article: Improving LLM-based Global Optimization with Search Space Partitioning
Improving LLM-based Global Optimization with Search Space Partitioning Open
Large Language Models (LLMs) have recently emerged as effective surrogate models and candidate generators within global optimization frameworks for expensive blackbox functions. Despite promising results, LLM-based methods often struggle i…
View article: Hyperband-based Bayesian Optimization for Black-box Prompt Selection
Hyperband-based Bayesian Optimization for Black-box Prompt Selection Open
Optimal prompt selection is crucial for maximizing large language model (LLM) performance on downstream tasks, especially in black-box settings where models are only accessible via APIs. Black-box prompt selection is challenging due to pot…
View article: Hyperparameter Optimization in Machine Learning
Hyperparameter Optimization in Machine Learning Open
Hyperparameters are configuration variables controlling the behavior of machine learning algorithms. They are ubiquitous in machine learning and artificial intelligence and the choice of their values determines the effectiveness of systems…
View article: Compressing Large Language Models with Automated Sub-Network Search
Compressing Large Language Models with Automated Sub-Network Search Open
Large Language Models (LLMs) demonstrate exceptional reasoning abilities, enabling strong generalization across diverse tasks such as commonsense reasoning and instruction following. However, as LLMs scale, inference costs become increasin…
View article: Structural Pruning of Pre-trained Language Models via Neural Architecture Search
Structural Pruning of Pre-trained Language Models via Neural Architecture Search Open
Pre-trained language models (PLM), for example BERT or RoBERTa, mark the state-of-the-art for natural language understanding task when fine-tuned on labeled data. However, their large size poses challenges in deploying them for inference i…
View article: Obeying the Order: Introducing Ordered Transfer Hyperparameter Optimisation
Obeying the Order: Introducing Ordered Transfer Hyperparameter Optimisation Open
We introduce ordered transfer hyperparameter optimisation (OTHPO), a version of transfer learning for hyperparameter optimisation (HPO) where the tasks follow a sequential order. Unlike for state-of-the-art transfer HPO, the assumption is …
View article: Optimizing Hyperparameters with Conformal Quantile Regression
Optimizing Hyperparameters with Conformal Quantile Regression Open
Many state-of-the-art hyperparameter optimization (HPO) algorithms rely on model-based optimizers that learn surrogate models of the target function to guide the search. Gaussian processes are the de facto surrogate model due to their abil…
View article: HPOBench: A Collection of Reproducible Multi-Fidelity Benchmark Problems for HPO
HPOBench: A Collection of Reproducible Multi-Fidelity Benchmark Problems for HPO Open
To achieve peak predictive performance, hyperparameter optimization (HPO) is a crucial component of machine learning and its applications. Over the last years, the number of efficient algorithms and tools for HPO grew substantially. At the…
View article: Online Optimization of Stimulation Speed in an Auditory Brain-Computer Interface under Time Constraints
Online Optimization of Stimulation Speed in an Auditory Brain-Computer Interface under Time Constraints Open
The decoding of brain signals recorded via, e.g., an electroencephalogram, using machine learning is key to brain-computer interfaces (BCIs). Stimulation parameters or other experimental settings of the BCI protocol typically are chosen ac…
View article: Overfitting in Bayesian Optimization: an empirical study and early-stopping solution
Overfitting in Bayesian Optimization: an empirical study and early-stopping solution Open
Tuning machine learning models with Bayesian optimization (BO) is a successful strategy to find good hyperparameters. BO defines an iterative procedure where a cross-validated metric is evaluated on promising hyperparameters. In practice, …
View article: Automatic Termination for Hyperparameter Optimization
Automatic Termination for Hyperparameter Optimization Open
Bayesian optimization (BO) is a widely popular approach for the hyperparameter optimization (HPO) in machine learning. At its core, BO iteratively evaluates promising configurations until a user-defined budget, such as wall-clock time or n…
View article: Hyperparameter Transfer Learning with Adaptive Complexity
Hyperparameter Transfer Learning with Adaptive Complexity Open
Bayesian optimization (BO) is a sample efficient approach to automatically tune the hyperparameters of machine learning models. In practice, one frequently has to solve similar hyperparameter tuning problems sequentially. For example, one …
View article: BORE: Bayesian Optimization by Density-Ratio Estimation
BORE: Bayesian Optimization by Density-Ratio Estimation Open
Bayesian optimization (BO) is among the most effective and widely-used blackbox optimization methods. BO proposes solutions according to an explore-exploit trade-off criterion encoded in an acquisition function, many of which are computed …
View article: Model-based Asynchronous Hyperparameter and Neural Architecture Search
Model-based Asynchronous Hyperparameter and Neural Architecture Search Open
We introduce a model-based asynchronous multi-fidelity method for hyperparameter and neural architecture search that combines the strengths of asynchronous Hyperband and Gaussian process-based Bayesian optimization. At the heart of our met…
View article: Efficient bayesian hyperparameter optimization
Efficient bayesian hyperparameter optimization Open
Automated machine learning emerged as a new research field inside of machine learning that tries to progressively automate different steps of common machine learning pipelines which are traditionally executed by humans. One of its core tas…
View article: Probabilistic Rollouts for Learning Curve Extrapolation Across Hyperparameter Settings
Probabilistic Rollouts for Learning Curve Extrapolation Across Hyperparameter Settings Open
We propose probabilistic models that can extrapolate learning curves of iterative machine learning algorithms, such as stochastic gradient descent for training deep networks, based on training data with variable-length learning curves. We …
View article: Meta-Surrogate Benchmarking for Hyperparameter Optimization
Meta-Surrogate Benchmarking for Hyperparameter Optimization Open
Despite the recent progress in hyperparameter optimization (HPO), available benchmarks that resemble real-world scenarios consist of a few and very large problem instances that are expensive to solve. This blocks researchers and practition…
View article: Tabular Benchmarks for Joint Architecture and Hyperparameter Optimization
Tabular Benchmarks for Joint Architecture and Hyperparameter Optimization Open
Due to the high computational demands executing a rigorous comparison between hyperparameter optimization (HPO) methods is often cumbersome. The goal of this paper is to facilitate a better empirical evaluation of HPO methods by providing …
View article: Tabular Benchmarks for Joint Architecture and Hyperparameter\n Optimization
Tabular Benchmarks for Joint Architecture and Hyperparameter\n Optimization Open
Due to the high computational demands executing a rigorous comparison between\nhyperparameter optimization (HPO) methods is often cumbersome. The goal of this\npaper is to facilitate a better empirical evaluation of HPO methods by\nprovidi…
View article: NAS-Bench-101: Towards Reproducible Neural Architecture Search
NAS-Bench-101: Towards Reproducible Neural Architecture Search Open
Recent advances in neural architecture search (NAS) demand tremendous computational resources, which makes it difficult to reproduce experiments and imposes a barrier-to-entry to researchers without access to large-scale computation. We ai…
View article: Towards Automatically-Tuned Deep Neural Networks
Towards Automatically-Tuned Deep Neural Networks Open
Recent advances in AutoML have led to automated tools that can compete with machine learning experts on supervised learning tasks. In this work, we present two versions of Auto-Net, which provide automatically-tuned deep neural networks wi…
View article: Meta-Surrogate Benchmarking for Hyperparameter Optimization
Meta-Surrogate Benchmarking for Hyperparameter Optimization Open
Despite the recent progress in hyperparameter optimization (HPO), available benchmarks that resemble real-world scenarios consist of a few and very large problem instances that are expensive to solve. This blocks researchers and practition…
View article: Auto-sklearn: Efficient and Robust Automated Machine Learning
Auto-sklearn: Efficient and Robust Automated Machine Learning Open
The success of machine learning in a broad range of applications has led to an ever-growing demand for machine learning systems that can be used off the shelf by non-experts. To be effective in practice, such systems need to automatically …
View article: Towards Automated Deep Learning: Efficient Joint Neural Architecture and Hyperparameter Search
Towards Automated Deep Learning: Efficient Joint Neural Architecture and Hyperparameter Search Open
While existing work on neural architecture search (NAS) tunes hyperparameters in a separate post-processing step, we demonstrate that architectural choices and other hyperparameter settings interact in a way that can render this separation…
View article: BOHB: Robust and Efficient Hyperparameter Optimization at Scale
BOHB: Robust and Efficient Hyperparameter Optimization at Scale Open
Modern deep learning methods are very sensitive to many hyperparameters, and, due to the long training times of state-of-the-art models, vanilla Bayesian hyperparameter optimization is typically computationally infeasible. On the other han…
View article: Uncertainty Estimates for Optical Flow with Multi-Hypotheses Networks
Uncertainty Estimates for Optical Flow with Multi-Hypotheses Networks Open
Recent work has shown that optical flow estimation can be formulated as an end-to-end supervised learning problem, which yields estimates with a superior accuracy-runtime tradeoff compared to alternative methodology. In this paper, we make…
View article: Uncertainty Estimates and Multi-Hypotheses Networks for Optical Flow
Uncertainty Estimates and Multi-Hypotheses Networks for Optical Flow Open
Optical flow estimation can be formulated as an end-to-end supervised learning problem, which yields estimates with a superior accuracy-runtime tradeoff compared to alternative methodology. In this paper, we make such networks estimate the…
View article: Fast Bayesian hyperparameter optimization on large datasets
Fast Bayesian hyperparameter optimization on large datasets Open
Bayesian optimization has become a successful tool for optimizing the hyperparameters of machine learning algorithms, such as support vector machines or deep neural networks. Despite its success, for large datasets, training and validating…
View article: The Sacred Infrastructure for Computational Research
The Sacred Infrastructure for Computational Research Open
We present a toolchain for computational research consisting of Sacred and two supporting tools. Sacred is an open source Python framework which aims to provide basic infrastructure for running computational experiments independent of the …