On Thompson Sampling and Asymptotic Optimality Article Swipe
Related Concepts
Reinforcement learning
Sublinear function
Thompson sampling
Countable set
Regret
Ergodic theory
Sampling (signal processing)
Mathematics
Class (philosophy)
Nonparametric statistics
Mathematical optimization
Computer science
Markov decision process
Markov process
Applied mathematics
Discrete mathematics
Artificial intelligence
Statistics
Pure mathematics
Computer vision
Filter (signal processing)
Jan Leike
,
Tor Lattimore
,
Laurent Orseau
,
Marcus Hütter
·
YOU?
·
· 2017
· Open Access
·
· DOI: https://doi.org/10.24963/ijcai.2017/688
· OA: W2740008209
YOU?
·
· 2017
· Open Access
·
· DOI: https://doi.org/10.24963/ijcai.2017/688
· OA: W2740008209
We discuss some recent results on Thompson sampling for nonparametric reinforcement learning in countable classes of general stochastic environments. These environments can be non-Markovian, non-ergodic, and partially observable. We show that Thompson sampling learns the environment class in the sense that (1) asymptotically its value converges in mean to the optimal value and (2) given a recoverability assumption regret is sublinear. We conclude with a discussion about optimality in reinforcement learning.
Related Topics
Finding more related topics…