Michael Kaisers
YOU?
Author Swipe
View article: Mastering Board Games by External and Internal Planning with Language Models
Mastering Board Games by External and Internal Planning with Language Models Open
Advancing planning and reasoning capabilities of Large Language Models (LLMs) is one of the key prerequisites towards unlocking their potential for performing reliably in complex and impactful domains. In this paper, we aim to demonstrate …
View article: Soft Condorcet Optimization for Ranking of General Agents
Soft Condorcet Optimization for Ranking of General Agents Open
Driving progress of AI models and agents requires comparing their performance on standardized benchmarks; for general agents, individual performances must be aggregated across a potentially wide variety of different tasks. In this paper, w…
View article: TacticAI: an AI assistant for football tactics
TacticAI: an AI assistant for football tactics Open
Identifying key patterns of tactics implemented by rival teams, and developing effective responses, lies at the heart of modern football. However, doing so algorithmically remains an open research challenge. To address this unmet need, we …
View article: Approximating the Core via Iterative Coalition Sampling
Approximating the Core via Iterative Coalition Sampling Open
The core is a central solution concept in cooperative game theory, defined as the set of feasible allocations or payments such that no subset of agents has incentive to break away and form their own subgroup or coalition. However, it has l…
View article: TacticAI: an AI assistant for football tactics
TacticAI: an AI assistant for football tactics Open
Identifying key patterns of tactics implemented by rival teams, and developing effective responses, lies at the heart of modern football. However, doing so algorithmically remains an open research challenge. To address this unmet need, we …
View article: BRExIt: On Opponent Modelling in Expert Iteration
BRExIt: On Opponent Modelling in Expert Iteration Open
Finding a best response policy is a central objective in game theory and multi-agent learning, with modern population-based training approaches employing reinforcement learning algorithms as best-response oracles to improve play against ca…
View article: Online Planning in POMDPs with Self-Improving Simulators
Online Planning in POMDPs with Self-Improving Simulators Open
How can we plan efficiently in a large and complex environment when the time budget is limited? Given the original simulator of the environment, which may be computationally very demanding, we propose to learn online an approximate but muc…
View article: BRExIt: On Opponent Modelling in Expert Iteration
BRExIt: On Opponent Modelling in Expert Iteration Open
Finding a best response policy is a central objective in game theory and multi-agent learning, with modern population-based training approaches employing reinforcement learning algorithms as best-response oracles to improve play against ca…
View article: Online Planning in POMDPs with Self-Improving Simulators
Online Planning in POMDPs with Self-Improving Simulators Open
How can we plan efficiently in a large and complex environment when the time budget is limited? Given the original simulator of the environment, which may be computationally very demanding, we propose to learn online an approximate but muc…
View article: Bargaining Chips: Coordinating One-to-Many Concurrent Composite Negotiations
Bargaining Chips: Coordinating One-to-Many Concurrent Composite Negotiations Open
This study presents Bargaining Chips: a framework for one-to-many concurrent composite negotiations, where multiple deals can be reached and combined. Our framework is designed to mirror the salient aspects of real-life procurement and tra…
View article: ME-MCTS: Online Generalization by Combining Multiple Value Estimators
ME-MCTS: Online Generalization by Combining Multiple Value Estimators Open
This paper addresses the challenge of online generalization in tree search. We propose Multiple Estimator Monte Carlo Tree Search (ME-MCTS), with a two-fold contribution: first, we introduce a formalization of online generalization that ca…
View article: Novelty and MCTS
Novelty and MCTS Open
Novelty search has become a popular technique in different fields such as evolutionary computing, classical AI planning, and deep reinforcement learning. Searching for novelty instead of, or in addition to, directly maximizing the search o…
View article: Robust online planning with imperfect models
Robust online planning with imperfect models Open
Environment models are not always known a priori, and approximating stochastic transition dynamics may introduce errors, especially if
View article: Towards explainable MCTS
Towards explainable MCTS Open
Monte-Carlo Tree Search (MCTS) is a family of sampling-based search algorithms widely used for online planning in sequential decision-making domains, and at the heart of many recent breakthroughs in AI. Understanding the behavior of MCTS a…
View article: Robust multi-agent Q-learning in cooperative games with adversaries
Robust multi-agent Q-learning in cooperative games with adversaries Open
We present RoM-Q 1, a new Q-learning-like algorithm for finding policies robust to attacks in multi-agent systems (MAS). We consider a novel type of attack, where a team of adversaries, aware of the optimal multi-agent Q-value function, pe…
View article: Opponent-Pruning Paranoid Search
Opponent-Pruning Paranoid Search Open
This paper proposes a new search algorithm for fully observable, deterministic multiplayer games: Opponent-Pruning Paranoid Search (OPPS). OPPS is a generalization of a state-of-the-art technique for this class of games, Best-Reply Search …
View article: Guiding Multiplayer MCTS by Focusing on Yourself
Guiding Multiplayer MCTS by Focusing on Yourself Open
In n-player sequential move games, the second root-player move appears at tree depth n + 1. Depending on n and time, tree search techniques can struggle to expand the game tree deeply enough to find multiple-move plans of the root player, …
View article: Degrees of Rationality in Agent-Based Retail Markets
Degrees of Rationality in Agent-Based Retail Markets Open
The imperfect decision-making of human buyers participating in retail markets varies from fundamental models that assume rational economic choices: even in markets with identical items human buyers are not rational, i.e., buyers do not alw…
View article: Automated Peer-to-peer Negotiation for Energy Contract Settlements in Residential Cooperatives
Automated Peer-to-peer Negotiation for Energy Contract Settlements in Residential Cooperatives Open
This paper presents an automated peer-to-peer negotiation strategy for settling energy contracts among prosumers in a Residential Energy Cooperative considering heterogeneity prosumer preferences. The heterogeneity arises from prosumers' e…
View article: Automated Peer-to-peer Negotiation for Energy Contract Settlements in\n Residential Cooperatives
Automated Peer-to-peer Negotiation for Energy Contract Settlements in\n Residential Cooperatives Open
This paper presents an automated peer-to-peer negotiation strategy for\nsettling energy contracts among prosumers in a Residential Energy Cooperative\nconsidering heterogeneity prosumer preferences. The heterogeneity arises from\nprosumers…
View article: Automated Negotiation with Gaussian Process-based Utility Models
Automated Negotiation with Gaussian Process-based Utility Models Open
Designing agents that can efficiently learn and integrate user's preferences into decision making processes is a key challenge in automated negotiation. While accurate knowledge of user preferences is highly desirable, eliciting the necess…
View article: Forecast-Based Mechanisms for Demand Response
Forecast-Based Mechanisms for Demand Response Open
We study mechanisms to incentivize demand response in smart energy systems. We assume agents that can respond (reduce their demand) with some probability if they prepare prior to the real-ization of the demand. Both preparation and respons…
View article: Robust Temporal Difference Learning for Critical Domains
Robust Temporal Difference Learning for Critical Domains Open
We present a new Q-function operator for temporal difference (TD) learning methods that explicitly encodes robustness against significant rare events (SRE) in critical domains. The operator, which we call the κ-operator, allows to learn a …
View article: An Exchange Mechanism to Coordinate Flexibility in Residential Energy Cooperatives
An Exchange Mechanism to Coordinate Flexibility in Residential Energy Cooperatives Open
Energy cooperatives (ECs) such as residential and industrial microgrids have the potential to mitigate increasing fluctuations in renewable electricity generation, but only if their joint response is coordinated. However, the coordination …
View article: Robust Temporal Difference Learning for Critical Domains
Robust Temporal Difference Learning for Critical Domains Open
We present a new Q-function operator for temporal difference (TD) learning methods that explicitly encodes robustness against significant rare events (SRE) in critical domains. The operator, which we call the $κ$-operator, allows to learn …
View article: An Exchange Mechanism to Coordinate Flexibility in Residential Energy Cooperatives
An Exchange Mechanism to Coordinate Flexibility in Residential Energy Cooperatives Open
Energy cooperatives (ECs) such as residential and industrial microgrids have the potential to mitigate increasing fluctuations in renewable electricity generation, but only if their joint response is coordinated. However, the coordination …