Minimax Regret Bounds for Reinforcement Learning Article Swipe

PDF

Mohammad Gheshlaghi Azar , Ian Osband , Rémi Munos ·

YOU? · · 2017 · Open Access · · DOI: https://doi.org/10.48550/arxiv.1703.05449

We consider the problem of provably optimal exploration in reinforcement learning for finite horizon MDPs. We show that an optimistic modification to value iteration achieves a regret bound of $\tilde{O}( \sqrt{HSAT} + H^2S^2A+H\sqrt{T})$ where $H$ is the time horizon, $S$ the number of states, $A$ the number of actions and $T$ the number of time-steps. This result improves over the best previous known bound $\tilde{O}(HS \sqrt{AT})$ achieved by the UCRL2 algorithm of Jaksch et al., 2010. The key significance of our new results is that when $T\geq H^3S^3A$ and $SA\geq H$, it leads to a regret of $\tilde{O}(\sqrt{HSAT})$ that matches the established lower bound of $Ω(\sqrt{HSAT})$ up to a logarithmic factor. Our analysis contains two key insights. We use careful application of concentration inequalities to the optimal value function as a whole, rather than to the transitions probabilities (to improve scaling in $S$), and we define Bernstein-based "exploration bonuses" that use the empirical variance of the estimated values at the next states (to improve scaling in $H$).

Related Topics

Logarithm

Combinatorics

Reinforcement Learning

Mathematical Analysis

Concepts

Regret Logarithm Combinatorics Upper and lower bounds Scaling Minimax Reinforcement learning Omega Mathematics Horizon Function (biology) Key (lock) Discrete mathematics Computer science Mathematical optimization Physics Statistics Machine learning Mathematical analysis Quantum mechanics Computer security Geometry Biology Evolutionary biology

Metadata

Type: preprint
Language: en
Landing Page: http://arxiv.org/abs/1703.05449
PDF: https://arxiv.org/pdf/1703.05449
OA Status: green
Cited By: 50
References: 20
Related Works: 20
OpenAlex ID: https://openalex.org/W2604884452

All OpenAlex metadata

Raw OpenAlex JSON

OpenAlex ID: https://openalex.org/W2604884452

Canonical identifier for this work in OpenAlex
DOI: https://doi.org/10.48550/arxiv.1703.05449

Digital Object Identifier
Title: Minimax Regret Bounds for Reinforcement Learning

Work title
Type: preprint

OpenAlex work type
Language: en

Primary language
Publication year: 2017

Year of publication
Publication date: 2017-03-16

Full publication date if available
Authors: Mohammad Gheshlaghi Azar, Ian Osband, Rémi Munos

List of authors in order
Landing page: https://arxiv.org/abs/1703.05449

Publisher landing page
PDF URL: https://arxiv.org/pdf/1703.05449

Direct link to full text PDF
Open access: Yes

Whether a free full text is available
OA status: green

Open access status per OpenAlex
OA URL: https://arxiv.org/pdf/1703.05449

Direct OA link when available
Concepts: Regret, Logarithm, Combinatorics, Upper and lower bounds, Scaling, Minimax, Reinforcement learning, Omega, Mathematics, Horizon, Function (biology), Key (lock), Discrete mathematics, Computer science, Mathematical optimization, Physics, Statistics, Machine learning, Mathematical analysis, Quantum mechanics, Computer security, Geometry, Biology, Evolutionary biology

Top concepts (fields/topics) attached by OpenAlex
Cited by: 50

Total citation count in OpenAlex
Citations by year (recent): 2025: 1, 2024: 2, 2023: 2, 2022: 3, 2021: 13

Per-year citation counts (last 5 years)
References (count): 20

Number of works referenced by this work
Related works (count): 20

Other works algorithmically related by OpenAlex

Full payload

id	https://openalex.org/W2604884452
doi	https://doi.org/10.48550/arxiv.1703.05449
ids.doi	https://doi.org/10.48550/arxiv.1703.05449
ids.mag	2604884452
ids.openalex	https://openalex.org/W2604884452
fwci
type	preprint
title	Minimax Regret Bounds for Reinforcement Learning
biblio.issue
biblio.volume
biblio.last_page	272
biblio.first_page	263
topics[0].id	https://openalex.org/T12101
topics[0].field.id	https://openalex.org/fields/18
topics[0].field.display_name	Decision Sciences
topics[0].score	0.9998999834060669
topics[0].domain.id	https://openalex.org/domains/2
topics[0].domain.display_name	Social Sciences
topics[0].subfield.id	https://openalex.org/subfields/1803
topics[0].subfield.display_name	Management Science and Operations Research
topics[0].display_name	Advanced Bandit Algorithms Research
topics[1].id	https://openalex.org/T10462
topics[1].field.id	https://openalex.org/fields/17
topics[1].field.display_name	Computer Science
topics[1].score	0.9983999729156494
topics[1].domain.id	https://openalex.org/domains/3
topics[1].domain.display_name	Physical Sciences
topics[1].subfield.id	https://openalex.org/subfields/1702
topics[1].subfield.display_name	Artificial Intelligence
topics[1].display_name	Reinforcement Learning in Robotics
topics[2].id	https://openalex.org/T12072
topics[2].field.id	https://openalex.org/fields/17
topics[2].field.display_name	Computer Science
topics[2].score	0.9909999966621399
topics[2].domain.id	https://openalex.org/domains/3
topics[2].domain.display_name	Physical Sciences
topics[2].subfield.id	https://openalex.org/subfields/1702
topics[2].subfield.display_name	Artificial Intelligence
topics[2].display_name	Machine Learning and Algorithms
is_xpac	False
apc_list
apc_paid
concepts[0].id	https://openalex.org/C50817715
concepts[0].level	2
concepts[0].score	0.7074787020683289
concepts[0].wikidata	https://www.wikidata.org/wiki/Q79895177
concepts[0].display_name	Regret
concepts[1].id	https://openalex.org/C39927690
concepts[1].level	2
concepts[1].score	0.659284770488739
concepts[1].wikidata	https://www.wikidata.org/wiki/Q11197
concepts[1].display_name	Logarithm
concepts[2].id	https://openalex.org/C114614502
concepts[2].level	1
concepts[2].score	0.6382050514221191
concepts[2].wikidata	https://www.wikidata.org/wiki/Q76592
concepts[2].display_name	Combinatorics
concepts[3].id	https://openalex.org/C77553402
concepts[3].level	2
concepts[3].score	0.6331272125244141
concepts[3].wikidata	https://www.wikidata.org/wiki/Q13222579
concepts[3].display_name	Upper and lower bounds
concepts[4].id	https://openalex.org/C99844830
concepts[4].level	2
concepts[4].score	0.5872185826301575
concepts[4].wikidata	https://www.wikidata.org/wiki/Q102441924
concepts[4].display_name	Scaling
concepts[5].id	https://openalex.org/C149728462
concepts[5].level	2
concepts[5].score	0.5683568716049194
concepts[5].wikidata	https://www.wikidata.org/wiki/Q751319
concepts[5].display_name	Minimax
concepts[6].id	https://openalex.org/C97541855
concepts[6].level	2
concepts[6].score	0.5659250617027283
concepts[6].wikidata	https://www.wikidata.org/wiki/Q830687
concepts[6].display_name	Reinforcement learning
concepts[7].id	https://openalex.org/C2779557605
concepts[7].level	2
concepts[7].score	0.5628350377082825
concepts[7].wikidata	https://www.wikidata.org/wiki/Q9890
concepts[7].display_name	Omega
concepts[8].id	https://openalex.org/C33923547
concepts[8].level	0
concepts[8].score	0.5451778769493103
concepts[8].wikidata	https://www.wikidata.org/wiki/Q395
concepts[8].display_name	Mathematics
concepts[9].id	https://openalex.org/C159176650
concepts[9].level	2
concepts[9].score	0.5010685920715332
concepts[9].wikidata	https://www.wikidata.org/wiki/Q43261
concepts[9].display_name	Horizon
concepts[10].id	https://openalex.org/C14036430
concepts[10].level	2
concepts[10].score	0.46781039237976074
concepts[10].wikidata	https://www.wikidata.org/wiki/Q3736076
concepts[10].display_name	Function (biology)
concepts[11].id	https://openalex.org/C26517878
concepts[11].level	2
concepts[11].score	0.44248804450035095
concepts[11].wikidata	https://www.wikidata.org/wiki/Q228039
concepts[11].display_name	Key (lock)
concepts[12].id	https://openalex.org/C118615104
concepts[12].level	1
concepts[12].score	0.38671889901161194
concepts[12].wikidata	https://www.wikidata.org/wiki/Q121416
concepts[12].display_name	Discrete mathematics
concepts[13].id	https://openalex.org/C41008148
concepts[13].level	0
concepts[13].score	0.19477537274360657
concepts[13].wikidata	https://www.wikidata.org/wiki/Q21198
concepts[13].display_name	Computer science
concepts[14].id	https://openalex.org/C126255220
concepts[14].level	1
concepts[14].score	0.1728718876838684
concepts[14].wikidata	https://www.wikidata.org/wiki/Q141495
concepts[14].display_name	Mathematical optimization
concepts[15].id	https://openalex.org/C121332964
concepts[15].level	0
concepts[15].score	0.14904409646987915
concepts[15].wikidata	https://www.wikidata.org/wiki/Q413
concepts[15].display_name	Physics
concepts[16].id	https://openalex.org/C105795698
concepts[16].level	1
concepts[16].score	0.11103245615959167
concepts[16].wikidata	https://www.wikidata.org/wiki/Q12483
concepts[16].display_name	Statistics
concepts[17].id	https://openalex.org/C119857082
concepts[17].level	1
concepts[17].score	0.07832729816436768
concepts[17].wikidata	https://www.wikidata.org/wiki/Q2539
concepts[17].display_name	Machine learning
concepts[18].id	https://openalex.org/C134306372
concepts[18].level	1
concepts[18].score	0.05842462182044983
concepts[18].wikidata	https://www.wikidata.org/wiki/Q7754
concepts[18].display_name	Mathematical analysis
concepts[19].id	https://openalex.org/C62520636
concepts[19].level	1
concepts[19].score	0.053871095180511475
concepts[19].wikidata	https://www.wikidata.org/wiki/Q944
concepts[19].display_name	Quantum mechanics
concepts[20].id	https://openalex.org/C38652104
concepts[20].level	1
concepts[20].score	0.0
concepts[20].wikidata	https://www.wikidata.org/wiki/Q3510521
concepts[20].display_name	Computer security
concepts[21].id	https://openalex.org/C2524010
concepts[21].level	1
concepts[21].score	0.0
concepts[21].wikidata	https://www.wikidata.org/wiki/Q8087
concepts[21].display_name	Geometry
concepts[22].id	https://openalex.org/C86803240
concepts[22].level	0
concepts[22].score	0.0
concepts[22].wikidata	https://www.wikidata.org/wiki/Q420
concepts[22].display_name	Biology
concepts[23].id	https://openalex.org/C78458016
concepts[23].level	1
concepts[23].score	0.0
concepts[23].wikidata	https://www.wikidata.org/wiki/Q840400
concepts[23].display_name	Evolutionary biology
keywords[0].id	https://openalex.org/keywords/regret
keywords[0].score	0.7074787020683289
keywords[0].display_name	Regret
keywords[1].id	https://openalex.org/keywords/logarithm
keywords[1].score	0.659284770488739
keywords[1].display_name	Logarithm
keywords[2].id	https://openalex.org/keywords/combinatorics
keywords[2].score	0.6382050514221191
keywords[2].display_name	Combinatorics
keywords[3].id	https://openalex.org/keywords/upper-and-lower-bounds
keywords[3].score	0.6331272125244141
keywords[3].display_name	Upper and lower bounds
keywords[4].id	https://openalex.org/keywords/scaling
keywords[4].score	0.5872185826301575
keywords[4].display_name	Scaling
keywords[5].id	https://openalex.org/keywords/minimax
keywords[5].score	0.5683568716049194
keywords[5].display_name	Minimax
keywords[6].id	https://openalex.org/keywords/reinforcement-learning
keywords[6].score	0.5659250617027283
keywords[6].display_name	Reinforcement learning
keywords[7].id	https://openalex.org/keywords/omega
keywords[7].score	0.5628350377082825
keywords[7].display_name	Omega
keywords[8].id	https://openalex.org/keywords/mathematics
keywords[8].score	0.5451778769493103
keywords[8].display_name	Mathematics
keywords[9].id	https://openalex.org/keywords/horizon
keywords[9].score	0.5010685920715332
keywords[9].display_name	Horizon
keywords[10].id	https://openalex.org/keywords/function
keywords[10].score	0.46781039237976074
keywords[10].display_name	Function (biology)
keywords[11].id	https://openalex.org/keywords/key
keywords[11].score	0.44248804450035095
keywords[11].display_name	Key (lock)
keywords[12].id	https://openalex.org/keywords/discrete-mathematics
keywords[12].score	0.38671889901161194
keywords[12].display_name	Discrete mathematics
keywords[13].id	https://openalex.org/keywords/computer-science
keywords[13].score	0.19477537274360657
keywords[13].display_name	Computer science
keywords[14].id	https://openalex.org/keywords/mathematical-optimization
keywords[14].score	0.1728718876838684
keywords[14].display_name	Mathematical optimization
keywords[15].id	https://openalex.org/keywords/physics
keywords[15].score	0.14904409646987915
keywords[15].display_name	Physics
keywords[16].id	https://openalex.org/keywords/statistics
keywords[16].score	0.11103245615959167
keywords[16].display_name	Statistics
keywords[17].id	https://openalex.org/keywords/machine-learning
keywords[17].score	0.07832729816436768
keywords[17].display_name	Machine learning
keywords[18].id	https://openalex.org/keywords/mathematical-analysis
keywords[18].score	0.05842462182044983
keywords[18].display_name	Mathematical analysis
keywords[19].id	https://openalex.org/keywords/quantum-mechanics
keywords[19].score	0.053871095180511475
keywords[19].display_name	Quantum mechanics
language	en
locations[0].id	pmh:oai:arXiv.org:1703.05449
locations[0].is_oa	True
locations[0].source.id	https://openalex.org/S4306400194
locations[0].source.issn
locations[0].source.type	repository
locations[0].source.is_oa	True
locations[0].source.issn_l
locations[0].source.is_core	False
locations[0].source.is_in_doaj	False
locations[0].source.display_name	arXiv (Cornell University)
locations[0].source.host_organization	https://openalex.org/I205783295
locations[0].source.host_organization_name	Cornell University
locations[0].source.host_organization_lineage	https://openalex.org/I205783295
locations[0].license
locations[0].pdf_url	https://arxiv.org/pdf/1703.05449
locations[0].version	submittedVersion
locations[0].raw_type	text
locations[0].license_id
locations[0].is_accepted	False
locations[0].is_published	False
locations[0].raw_source_name
locations[0].landing_page_url	http://arxiv.org/abs/1703.05449
locations[1].id	mag:2604884452
locations[1].is_oa	True
locations[1].source.id	https://openalex.org/S4306400194
locations[1].source.issn
locations[1].source.type	repository
locations[1].source.is_oa	True
locations[1].source.issn_l
locations[1].source.is_core	False
locations[1].source.is_in_doaj	False
locations[1].source.display_name	arXiv (Cornell University)
locations[1].source.host_organization	https://openalex.org/I205783295
locations[1].source.host_organization_name	Cornell University
locations[1].source.host_organization_lineage	https://openalex.org/I205783295
locations[1].license
locations[1].pdf_url
locations[1].version	submittedVersion
locations[1].raw_type
locations[1].license_id
locations[1].is_accepted	False
locations[1].is_published	False
locations[1].raw_source_name	arXiv (Cornell University)
locations[1].landing_page_url	https://arxiv.org/pdf/1703.05449.pdf
locations[2].id	doi:10.48550/arxiv.1703.05449
locations[2].is_oa	True
locations[2].source.id	https://openalex.org/S4306400194
locations[2].source.issn
locations[2].source.type	repository
locations[2].source.is_oa	True
locations[2].source.issn_l
locations[2].source.is_core	False
locations[2].source.is_in_doaj	False
locations[2].source.display_name	arXiv (Cornell University)
locations[2].source.host_organization	https://openalex.org/I205783295
locations[2].source.host_organization_name	Cornell University
locations[2].source.host_organization_lineage	https://openalex.org/I205783295
locations[2].license
locations[2].pdf_url
locations[2].version
locations[2].raw_type	article
locations[2].license_id
locations[2].is_accepted	False
locations[2].is_published
locations[2].raw_source_name
locations[2].landing_page_url	https://doi.org/10.48550/arxiv.1703.05449
locations[3].id	mag:2964054583
locations[3].is_oa	False
locations[3].source.id	https://openalex.org/S4306419644
locations[3].source.issn
locations[3].source.type	conference
locations[3].source.is_oa	False
locations[3].source.issn_l
locations[3].source.is_core	False
locations[3].source.is_in_doaj	False
locations[3].source.display_name	International Conference on Machine Learning
locations[3].source.host_organization
locations[3].source.host_organization_name
locations[3].license
locations[3].pdf_url
locations[3].version
locations[3].raw_type
locations[3].license_id
locations[3].is_accepted	False
locations[3].is_published
locations[3].raw_source_name	International Conference on Machine Learning
locations[3].landing_page_url	http://proceedings.mlr.press/v70/azar17a/azar17a.pdf
indexed_in	arxiv, datacite
authorships[0].author.id	https://openalex.org/A5043355670
authorships[0].author.orcid
authorships[0].author.display_name	Mohammad Gheshlaghi Azar
authorships[0].countries	GB
authorships[0].affiliations[0].institution_ids	https://openalex.org/I4210090411
authorships[0].affiliations[0].raw_affiliation_string	DeepMind, London, UK#TAB#
authorships[0].institutions[0].id	https://openalex.org/I4210090411
authorships[0].institutions[0].ror	https://ror.org/00971b260
authorships[0].institutions[0].type	company
authorships[0].institutions[0].lineage	https://openalex.org/I4210090411, https://openalex.org/I4210128969
authorships[0].institutions[0].country_code	GB
authorships[0].institutions[0].display_name	DeepMind (United Kingdom)
authorships[0].author_position	first
authorships[0].raw_author_name	Mohammad Gheshlaghi Azar
authorships[0].is_corresponding	False
authorships[0].raw_affiliation_strings	DeepMind, London, UK#TAB#
authorships[1].author.id	https://openalex.org/A5015899120
authorships[1].author.orcid
authorships[1].author.display_name	Ian Osband
authorships[1].countries	GB
authorships[1].affiliations[0].institution_ids	https://openalex.org/I4210090411
authorships[1].affiliations[0].raw_affiliation_string	DeepMind, London, UK#TAB#
authorships[1].institutions[0].id	https://openalex.org/I4210090411
authorships[1].institutions[0].ror	https://ror.org/00971b260
authorships[1].institutions[0].type	company
authorships[1].institutions[0].lineage	https://openalex.org/I4210090411, https://openalex.org/I4210128969
authorships[1].institutions[0].country_code	GB
authorships[1].institutions[0].display_name	DeepMind (United Kingdom)
authorships[1].author_position	middle
authorships[1].raw_author_name	Ian Osband
authorships[1].is_corresponding	False
authorships[1].raw_affiliation_strings	DeepMind, London, UK#TAB#
authorships[2].author.id	https://openalex.org/A5006533777
authorships[2].author.orcid
authorships[2].author.display_name	Rémi Munos
authorships[2].countries	GB
authorships[2].affiliations[0].institution_ids	https://openalex.org/I4210090411
authorships[2].affiliations[0].raw_affiliation_string	DeepMind, London, UK#TAB#
authorships[2].institutions[0].id	https://openalex.org/I4210090411
authorships[2].institutions[0].ror	https://ror.org/00971b260
authorships[2].institutions[0].type	company
authorships[2].institutions[0].lineage	https://openalex.org/I4210090411, https://openalex.org/I4210128969
authorships[2].institutions[0].country_code	GB
authorships[2].institutions[0].display_name	DeepMind (United Kingdom)
authorships[2].author_position	last
authorships[2].raw_author_name	Rémi Munos
authorships[2].is_corresponding	False
authorships[2].raw_affiliation_strings	DeepMind, London, UK#TAB#
has_content.pdf	False
has_content.grobid_xml	False
is_paratext	False
open_access.is_oa	True
open_access.oa_url	https://arxiv.org/pdf/1703.05449
open_access.oa_status	green
open_access.any_repository_has_fulltext	False
created_date	2025-10-10T00:00:00
display_name	Minimax Regret Bounds for Reinforcement Learning
has_fulltext	False
is_retracted	False
updated_date	2025-11-06T06:51:31.235846
primary_topic.id	https://openalex.org/T12101
primary_topic.field.id	https://openalex.org/fields/18
primary_topic.field.display_name	Decision Sciences
primary_topic.score	0.9998999834060669
primary_topic.domain.id	https://openalex.org/domains/2
primary_topic.domain.display_name	Social Sciences
primary_topic.subfield.id	https://openalex.org/subfields/1803
primary_topic.subfield.display_name	Management Science and Operations Research
primary_topic.display_name	Advanced Bandit Algorithms Research
related_works	https://openalex.org/W1850488217, https://openalex.org/W2119567691, https://openalex.org/W2121863487, https://openalex.org/W2963049774, https://openalex.org/W2145339207, https://openalex.org/W2907502549, https://openalex.org/W3046395471, https://openalex.org/W2257979135, https://openalex.org/W2119738618, https://openalex.org/W107583932, https://openalex.org/W2766447205, https://openalex.org/W2129670787, https://openalex.org/W2964000194, https://openalex.org/W2949608212, https://openalex.org/W2944264312, https://openalex.org/W2769648743, https://openalex.org/W2489939061, https://openalex.org/W2168405694, https://openalex.org/W1969276875, https://openalex.org/W1757796397
cited_by_count	50
counts_by_year[0].year	2025
counts_by_year[0].cited_by_count	1
counts_by_year[1].year	2024
counts_by_year[1].cited_by_count	2
counts_by_year[2].year	2023
counts_by_year[2].cited_by_count	2
counts_by_year[3].year	2022
counts_by_year[3].cited_by_count	3
counts_by_year[4].year	2021
counts_by_year[4].cited_by_count	13
counts_by_year[5].year	2020
counts_by_year[5].cited_by_count	14
counts_by_year[6].year	2019
counts_by_year[6].cited_by_count	10
counts_by_year[7].year	2018
counts_by_year[7].cited_by_count	3
counts_by_year[8].year	2017
counts_by_year[8].cited_by_count	2
locations_count	4
best_oa_location.id	pmh:oai:arXiv.org:1703.05449
best_oa_location.is_oa	True
best_oa_location.source.id	https://openalex.org/S4306400194
best_oa_location.source.issn
best_oa_location.source.type	repository
best_oa_location.source.is_oa	True
best_oa_location.source.issn_l
best_oa_location.source.is_core	False
best_oa_location.source.is_in_doaj	False
best_oa_location.source.display_name	arXiv (Cornell University)
best_oa_location.source.host_organization	https://openalex.org/I205783295
best_oa_location.source.host_organization_name	Cornell University
best_oa_location.source.host_organization_lineage	https://openalex.org/I205783295
best_oa_location.license
best_oa_location.pdf_url	https://arxiv.org/pdf/1703.05449
best_oa_location.version	submittedVersion
best_oa_location.raw_type	text
best_oa_location.license_id
best_oa_location.is_accepted	False
best_oa_location.is_published	False
best_oa_location.raw_source_name
best_oa_location.landing_page_url	http://arxiv.org/abs/1703.05449
primary_location.id	pmh:oai:arXiv.org:1703.05449
primary_location.is_oa	True
primary_location.source.id	https://openalex.org/S4306400194
primary_location.source.issn
primary_location.source.type	repository
primary_location.source.is_oa	True
primary_location.source.issn_l
primary_location.source.is_core	False
primary_location.source.is_in_doaj	False
primary_location.source.display_name	arXiv (Cornell University)
primary_location.source.host_organization	https://openalex.org/I205783295
primary_location.source.host_organization_name	Cornell University
primary_location.source.host_organization_lineage	https://openalex.org/I205783295
primary_location.license
primary_location.pdf_url	https://arxiv.org/pdf/1703.05449
primary_location.version	submittedVersion
primary_location.raw_type	text
primary_location.license_id
primary_location.is_accepted	False
primary_location.is_published	False
primary_location.raw_source_name
primary_location.landing_page_url	http://arxiv.org/abs/1703.05449
publication_date	2017-03-16
publication_year	2017
referenced_works	https://openalex.org/W1570963478, https://openalex.org/W2122701159, https://openalex.org/W2750990725, https://openalex.org/W2126163471, https://openalex.org/W21934178, https://openalex.org/W1576452626, https://openalex.org/W1505937442, https://openalex.org/W2489939061, https://openalex.org/W2083459869, https://openalex.org/W2049934117, https://openalex.org/W1786332878, https://openalex.org/W1998376807, https://openalex.org/W2120678009, https://openalex.org/W2120090487, https://openalex.org/W2073107347, https://openalex.org/W2312609093, https://openalex.org/W1850488217, https://openalex.org/W1988526405, https://openalex.org/W2039522160, https://openalex.org/W2121863487
referenced_works_count	20
abstract_inverted_index.+	31
abstract_inverted_index.a	25, 94, 108, 130
abstract_inverted_index.We	0, 15, 117
abstract_inverted_index.an	18
abstract_inverted_index.as	129
abstract_inverted_index.at	158
abstract_inverted_index.by	67
abstract_inverted_index.et	73
abstract_inverted_index.in	8, 141, 165
abstract_inverted_index.is	35, 83
abstract_inverted_index.it	91
abstract_inverted_index.of	4, 28, 42, 47, 53, 71, 79, 96, 104, 121, 154
abstract_inverted_index.to	21, 93, 107, 124, 134
abstract_inverted_index.up	106
abstract_inverted_index.we	144
abstract_inverted_index.$A$	44
abstract_inverted_index.$H$	34
abstract_inverted_index.$S$	39
abstract_inverted_index.$T$	50
abstract_inverted_index.(to	138, 162
abstract_inverted_index.H$,	90
abstract_inverted_index.Our	111
abstract_inverted_index.The	76
abstract_inverted_index.and	49, 88, 143
abstract_inverted_index.for	11
abstract_inverted_index.key	77, 115
abstract_inverted_index.new	81
abstract_inverted_index.our	80
abstract_inverted_index.the	2, 36, 40, 45, 51, 59, 68, 100, 125, 135, 151, 155, 159
abstract_inverted_index.two	114
abstract_inverted_index.use	118, 150
abstract_inverted_index.This	55
abstract_inverted_index.al.,	74
abstract_inverted_index.best	60
abstract_inverted_index.next	160
abstract_inverted_index.over	58
abstract_inverted_index.show	16
abstract_inverted_index.than	133
abstract_inverted_index.that	17, 84, 98, 149
abstract_inverted_index.time	37
abstract_inverted_index.when	85
abstract_inverted_index.$H$).	166
abstract_inverted_index.$S$),	142
abstract_inverted_index.2010.	75
abstract_inverted_index.MDPs.	14
abstract_inverted_index.UCRL2	69
abstract_inverted_index.bound	27, 63, 103
abstract_inverted_index.known	62
abstract_inverted_index.leads	92
abstract_inverted_index.lower	102
abstract_inverted_index.value	22, 127
abstract_inverted_index.where	33
abstract_inverted_index.$T\geq	86
abstract_inverted_index.Jaksch	72
abstract_inverted_index.define	145
abstract_inverted_index.finite	12
abstract_inverted_index.number	41, 46, 52
abstract_inverted_index.rather	132
abstract_inverted_index.regret	26, 95
abstract_inverted_index.result	56
abstract_inverted_index.states	161
abstract_inverted_index.values	157
abstract_inverted_index.whole,	131
abstract_inverted_index.$SA\geq	89
abstract_inverted_index.actions	48
abstract_inverted_index.careful	119
abstract_inverted_index.factor.	110
abstract_inverted_index.horizon	13
abstract_inverted_index.improve	139, 163
abstract_inverted_index.matches	99
abstract_inverted_index.optimal	6, 126
abstract_inverted_index.problem	3
abstract_inverted_index.results	82
abstract_inverted_index.scaling	140, 164
abstract_inverted_index.states,	43
abstract_inverted_index.H^3S^3A$	87
abstract_inverted_index.achieved	66
abstract_inverted_index.achieves	24
abstract_inverted_index.analysis	112
abstract_inverted_index.bonuses"	148
abstract_inverted_index.consider	1
abstract_inverted_index.contains	113
abstract_inverted_index.function	128
abstract_inverted_index.horizon,	38
abstract_inverted_index.improves	57
abstract_inverted_index.learning	10
abstract_inverted_index.previous	61
abstract_inverted_index.provably	5
abstract_inverted_index.variance	153
abstract_inverted_index.algorithm	70
abstract_inverted_index.empirical	152
abstract_inverted_index.estimated	156
abstract_inverted_index.insights.	116
abstract_inverted_index.iteration	23
abstract_inverted_index.optimistic	19
abstract_inverted_index.$\tilde{O}(	29
abstract_inverted_index.\sqrt{AT})$	65
abstract_inverted_index.\sqrt{HSAT}	30
abstract_inverted_index.application	120
abstract_inverted_index.established	101
abstract_inverted_index.exploration	7
abstract_inverted_index.logarithmic	109
abstract_inverted_index.time-steps.	54
abstract_inverted_index.transitions	136
abstract_inverted_index."exploration	147
abstract_inverted_index.inequalities	123
abstract_inverted_index.modification	20
abstract_inverted_index.significance	78
abstract_inverted_index.$\tilde{O}(HS	64
abstract_inverted_index.concentration	122
abstract_inverted_index.probabilities	137
abstract_inverted_index.reinforcement	9
abstract_inverted_index.Bernstein-based	146
abstract_inverted_index.$Ω(\sqrt{HSAT})$	105
abstract_inverted_index.H^2S^2A+H\sqrt{T})$	32
abstract_inverted_index.$\tilde{O}(\sqrt{HSAT})$	97
cited_by_percentile_year
countries_distinct_count	1
institutions_distinct_count	3
citation_normalized_percentile