Ti-Rong Wu
YOU?
Author Swipe
View article: Dynamic Sight Range Selection in Multi-Agent Reinforcement Learning
Dynamic Sight Range Selection in Multi-Agent Reinforcement Learning Open
Multi-agent reinforcement Learning (MARL) is often challenged by the sight range dilemma, where agents either receive insufficient or excessive information from their environment. In this paper, we propose a novel method, called Dynamic Si…
View article: OptionZero: Planning with Learned Options
OptionZero: Planning with Learned Options Open
Planning with options -- a sequence of primitive actions -- has been shown effective in reinforcement learning within complex environments. Previous studies have focused on planning with predefined options or learned options through expert…
View article: Bridging Local and Global Knowledge via Transformer in Board Games
Bridging Local and Global Knowledge via Transformer in Board Games Open
Although AlphaZero has achieved superhuman performance in board games, recent studies reveal its limitations in handling scenarios requiring a comprehensive understanding of the entire board, such as recognizing long-sequence patterns in G…
View article: Game Solving with Online Fine-Tuning
Game Solving with Online Fine-Tuning Open
Game solving is a similar, yet more difficult task than mastering a game. Solving a game typically means to find the game-theoretic value (outcome given optimal play), and optionally a full strategy to follow in order to achieve that outco…
View article: MiniZero: Comparative Analysis of AlphaZero and MuZero on Go, Othello, and Atari Games
MiniZero: Comparative Analysis of AlphaZero and MuZero on Go, Othello, and Atari Games Open
This paper presents MiniZero, a zero-knowledge learning framework that supports four state-of-the-art algorithms, including AlphaZero, MuZero, Gumbel AlphaZero, and Gumbel MuZero. While these algorithms have demonstrated super-human perfor…
View article: A Local-Pattern Related Look-Up Table
A Local-Pattern Related Look-Up Table Open
This paper describes a Relevance-Zone pattern table (RZT) that can be used to replace a traditional transposition table. An RZT stores exact game values for patterns that are discovered during a Relevance-Zone-Based Search (RZS), which is …
View article: Are AlphaZero-like Agents Robust to Adversarial Perturbations?
Are AlphaZero-like Agents Robust to Adversarial Perturbations? Open
The success of AlphaZero (AZ) has demonstrated that neural-network-based Go AIs can surpass human performance by a large margin. Given that the state space of Go is extremely large and a human player can play the game from any legal state,…
View article: A Novel Approach to Solving Goal-Achieving Problems for Board Games
A Novel Approach to Solving Goal-Achieving Problems for Board Games Open
Goal-achieving problems are puzzles that set up a specific situation with a clear objective. An example that is well-studied is the category of life-and-death (L&D) problems for Go, which helps players hone their skill of identifying regio…
View article: A Novel Approach to Solving Goal-Achieving Problems for Board Games
A Novel Approach to Solving Goal-Achieving Problems for Board Games Open
Goal-achieving problems are puzzles that set up a specific situation with a clear objective. An example that is well-studied is the category of life-and-death (L&D) problems for Go, which helps players hone their skill of identifying regio…
View article: Learning to Stop: Dynamic Simulation Monte-Carlo Tree Search
Learning to Stop: Dynamic Simulation Monte-Carlo Tree Search Open
Monte Carlo tree search (MCTS) has achieved state-of-the-art results in many domains such as Go and Atari games when combining with deep neural networks (DNNs). When more simulations are executed, MCTS can achieve higher performance but al…
View article: Learning to Stop: Dynamic Simulation Monte-Carlo Tree Search
Learning to Stop: Dynamic Simulation Monte-Carlo Tree Search Open
Monte Carlo tree search (MCTS) has achieved state-of-the-art results in many domains such as Go and Atari games when combining with deep neural networks (DNNs). When more simulations are executed, MCTS can achieve higher performance but al…
View article: Strength Adjustment and Assessment for MCTS-Based Programs [Research Frontier]
Strength Adjustment and Assessment for MCTS-Based Programs [Research Frontier] Open
2048 is a single-player stochastic puzzle game. This intriguing and addictive\ngame has been popular worldwide and has attracted researchers to develop\ngame-playing programs. Due to its simplicity and complexity, 2048 has become an\ninter…
View article: Accelerating and Improving AlphaZero Using Population Based Training
Accelerating and Improving AlphaZero Using Population Based Training Open
AlphaZero has been very successful in many games. Unfortunately, it still consumes a huge amount of computing resources, the majority of which is spent in self-play. Hyperparameter tuning exacerbates the training cost since each hyperparam…
View article: Accelerating and Improving AlphaZero Using Population Based Training
Accelerating and Improving AlphaZero Using Population Based Training Open
AlphaZero has been very successful in many games. Unfortunately, it still consumes a huge amount of computing resources, the majority of which is spent in self-play. Hyperparameter tuning exacerbates the training cost since each hyperparam…
View article: On Strength Adjustment for MCTS-Based Programs
On Strength Adjustment for MCTS-Based Programs Open
This paper proposes an approach to strength adjustment for MCTS-based game-playing programs. In this approach, we use a softmax policy with a strength index z to choose moves. Most importantly, we filter low quality moves by excluding thos…
View article: Multi-Labelled Value Networks for Computer Go
Multi-Labelled Value Networks for Computer Go Open
This paper proposes a new approach to a novel value network architecture for the game Go, called a multi-labelled (ML) value network. In the ML value network, different values (win rates) are trained simultaneously for different settings o…