Ronald Ortner
YOU?
Author Swipe
View article: On cubic vertex-transitive graphs of given girth
On cubic vertex-transitive graphs of given girth Open
A set of vertices of a graph is distinguishing if the only automorphism that preserves it is the identity. The minimal size of such sets, if they exist, is the distinguishing cost. The distinguishing costs of vertex transitive cubic graphs…
View article: A Note on the Bias and Kemeny's Constant in Markov Reward Processes with an Application to Markov Chain Perturbation
A Note on the Bias and Kemeny's Constant in Markov Reward Processes with an Application to Markov Chain Perturbation Open
Given a unichain Markov reward process (MRP), we provide an explicit expression for the bias values in terms of mean first passage times. This result implies a generalization of known Markov chain perturbation bounds for the stationary dis…
View article: Autonomous Exploration for Navigating in MDPs Using Blackbox RL Algorithms
Autonomous Exploration for Navigating in MDPs Using Blackbox RL Algorithms Open
We consider the problem of navigating in a Markov decision process where extrinsic rewards are either absent or ignored. In this setting, the objective is to learn policies to reach all the states that are reachable within a given number o…
View article: When is Cartesian product a Cayley graph?
When is Cartesian product a Cayley graph? Open
A graph is said to be {\it Cayley} graph if its automorphism group admits a regular subgroup. Automorphisms of the Cartesian product of graphs are well understood, and it is known that Cartesian product of Cayley graphs is a Cayley graph. …
View article: Adaptive Algorithms for Meta-Induction
Adaptive Algorithms for Meta-Induction Open
Work in online learning traditionally considered induction-friendly (e.g. stochastic with a fixed distribution) and induction-hostile (adversarial) settings separately. While algorithms like Exp3 that have been developed for the adversaria…
View article: Predicting Packaging Sizes Using Machine Learning
Predicting Packaging Sizes Using Machine Learning Open
The increasing rate of e-commerce orders necessitates a faster packaging process, challenging warehouse employees to correctly choose the size of the package needed to pack each order. To speed up the packing process in the Austrian e-comm…
View article: Transfer in Reinforcement Learning via Regret Bounds for Learning Agents
Transfer in Reinforcement Learning via Regret Bounds for Learning Agents Open
We present an approach for the quantification of the usefulness of transfer in reinforcement learning via regret bounds for a multi-agent setting. Considering a number of $\aleph$ agents operating in the same Markov decision process, howev…
View article: A new heuristic and an exact approach for a production planning problem
A new heuristic and an exact approach for a production planning problem Open
We deal with a very complex and hard scheduling problem. Two types of products are processed by a heterogeneous resource set, where resources have different operating capabilities and setup times are considered. The processing of the produ…
View article: Regret Bounds for Reinforcement Learning via Markov Chain Concentration
Regret Bounds for Reinforcement Learning via Markov Chain Concentration Open
We give a simple optimistic algorithm for which it is easy to derive regret bounds of O(sqrt{t-mix SAT}) steps in uniformly ergodic Markov decision processes with S states, A actions, and mixing time parameter t-mix. These bounds are the f…
View article: Regret Bounds for Learning State Representations in Reinforcement Learning
Regret Bounds for Learning State Representations in Reinforcement Learning Open
We consider the problem of online reinforcement learning when several state representations (mapping histories to a discrete state space) are available to the learning agent. At least one of these representations is assumed to induce a Mar…
View article: Autonomous exploration for navigating in non-stationary CMPs
Autonomous exploration for navigating in non-stationary CMPs Open
We consider a setting in which the objective is to learn to navigate in a controlled Markov process (CMP) where transition probabilities may abruptly change. For this setting, we propose a performance measure called exploration steps which…
View article: Active Roll-outs in MDP with Irreversible Dynamics
Active Roll-outs in MDP with Irreversible Dynamics Open
In Reinforcement Learning (RL), regret guarantees scaling with the square root of the time horizon have been shown to hold only for communicating Markov decision processes (MDPs) where any two states are connected. This essentially means t…
View article: Variational Regret Bounds for Reinforcement Learning
Variational Regret Bounds for Reinforcement Learning Open
We consider undiscounted reinforcement learning in Markov decision processes (MDPs) where both the reward functions and the state-transition probabilities may vary (gradually or abruptly) over time. For this problem setting, we propose an …
View article: A Sliding-Window Algorithm for Markov Decision Processes with Arbitrarily Changing Rewards and Transitions
A Sliding-Window Algorithm for Markov Decision Processes with Arbitrarily Changing Rewards and Transitions Open
We consider reinforcement learning in changing Markov Decision Processes where both the state-transition probabilities and the reward functions may vary over time. For this problem setting, we propose an algorithm using a sliding window ap…
View article: Efficient Bias-Span-Constrained Exploration-Exploitation in Reinforcement Learning
Efficient Bias-Span-Constrained Exploration-Exploitation in Reinforcement Learning Open
We introduce SCAL, an algorithm designed to perform efficient exploration-exploitation in any unknown weakly-communicating Markov decision process (MDP) for which an upper bound $c$ on the span of the optimal bias function is known. For an…
View article: Pareto Front Identification from Stochastic Bandit Feedback
Pareto Front Identification from Stochastic Bandit Feedback Open
We consider the problem of identifying the Pareto front for multiple objectives from a finite set of operating points. Sampling an operating point gives a random vector where each coordinate corresponds to the value of one of the objective…
View article: Improved learning complexity in combinatorial pure exploration bandits
Improved learning complexity in combinatorial pure exploration bandits Open
We study the problem of combinatorial pure exploration in the stochastic multi-armed bandit problem. We first construct a new measure of complexity that provably characterizes the learning performance of the algorithms we propose for the f…