Hung Guei
YOU?
Author Swipe
View article: OptionZero: Planning with Learned Options
OptionZero: Planning with Learned Options Open
Planning with options -- a sequence of primitive actions -- has been shown effective in reinforcement learning within complex environments. Previous studies have focused on planning with predefined options or learned options through expert…
Game Solving with Online Fine-Tuning Open
Game solving is a similar, yet more difficult task than mastering a game. Solving a game typically means to find the game-theoretic value (outcome given optimal play), and optionally a full strategy to follow in order to achieve that outco…
View article: MiniZero: Comparative Analysis of AlphaZero and MuZero on Go, Othello, and Atari Games
MiniZero: Comparative Analysis of AlphaZero and MuZero on Go, Othello, and Atari Games Open
This paper presents MiniZero, a zero-knowledge learning framework that supports four state-of-the-art algorithms, including AlphaZero, MuZero, Gumbel AlphaZero, and Gumbel MuZero. While these algorithms have demonstrated super-human perfor…
Optimistic Temporal Difference Learning for <i>2048</i> Open
Temporal difference (TD) learning and its variants, such as multistage TD\n(MS-TD) learning and temporal coherence (TC) learning, have been successfully\napplied to 2048. These methods rely on the stochasticity of the environment of\n2048 …
Strength Adjustment and Assessment for MCTS-Based Programs [Research Frontier] Open
2048 is a single-player stochastic puzzle game. This intriguing and addictive\ngame has been popular worldwide and has attracted researchers to develop\ngame-playing programs. Due to its simplicity and complexity, 2048 has become an\ninter…
On Strength Adjustment for MCTS-Based Programs Open
This paper proposes an approach to strength adjustment for MCTS-based game-playing programs. In this approach, we use a softmax policy with a strength index z to choose moves. Most importantly, we filter low quality moves by excluding thos…