Levi Lu
YOU?
Author Swipe
View article: Boosted Off-Policy Learning
Boosted Off-Policy Learning Open
We propose the first boosting algorithm for off-policy learning from logged bandit feedback. Unlike existing boosting methods for supervised learning, our algorithm directly optimizes an estimate of the policy's expected reward. We analyze…