Minimax Regret Bounds for Reinforcement Learning Article Swipe
YOU?
·
· 2017
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.1703.05449
We consider the problem of provably optimal exploration in reinforcement learning for finite horizon MDPs. We show that an optimistic modification to value iteration achieves a regret bound of $\tilde{O}( \sqrt{HSAT} + H^2S^2A+H\sqrt{T})$ where $H$ is the time horizon, $S$ the number of states, $A$ the number of actions and $T$ the number of time-steps. This result improves over the best previous known bound $\tilde{O}(HS \sqrt{AT})$ achieved by the UCRL2 algorithm of Jaksch et al., 2010. The key significance of our new results is that when $T\geq H^3S^3A$ and $SA\geq H$, it leads to a regret of $\tilde{O}(\sqrt{HSAT})$ that matches the established lower bound of $Ω(\sqrt{HSAT})$ up to a logarithmic factor. Our analysis contains two key insights. We use careful application of concentration inequalities to the optimal value function as a whole, rather than to the transitions probabilities (to improve scaling in $S$), and we define Bernstein-based "exploration bonuses" that use the empirical variance of the estimated values at the next states (to improve scaling in $H$).
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/1703.05449
- https://arxiv.org/pdf/1703.05449
- OA Status
- green
- Cited By
- 50
- References
- 20
- Related Works
- 20
- OpenAlex ID
- https://openalex.org/W2604884452
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W2604884452Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.1703.05449Digital Object Identifier
- Title
-
Minimax Regret Bounds for Reinforcement LearningWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2017Year of publication
- Publication date
-
2017-03-16Full publication date if available
- Authors
-
Mohammad Gheshlaghi Azar, Ian Osband, Rémi MunosList of authors in order
- Landing page
-
https://arxiv.org/abs/1703.05449Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/1703.05449Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/1703.05449Direct OA link when available
- Concepts
-
Regret, Logarithm, Combinatorics, Upper and lower bounds, Scaling, Minimax, Reinforcement learning, Omega, Mathematics, Horizon, Function (biology), Key (lock), Discrete mathematics, Computer science, Mathematical optimization, Physics, Statistics, Machine learning, Mathematical analysis, Quantum mechanics, Computer security, Geometry, Biology, Evolutionary biologyTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
50Total citation count in OpenAlex
- Citations by year (recent)
-
2025: 1, 2024: 2, 2023: 2, 2022: 3, 2021: 13Per-year citation counts (last 5 years)
- References (count)
-
20Number of works referenced by this work
- Related works (count)
-
20Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W2604884452 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.1703.05449 |
| ids.doi | https://doi.org/10.48550/arxiv.1703.05449 |
| ids.mag | 2604884452 |
| ids.openalex | https://openalex.org/W2604884452 |
| fwci | |
| type | preprint |
| title | Minimax Regret Bounds for Reinforcement Learning |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | 272 |
| biblio.first_page | 263 |
| topics[0].id | https://openalex.org/T12101 |
| topics[0].field.id | https://openalex.org/fields/18 |
| topics[0].field.display_name | Decision Sciences |
| topics[0].score | 0.9998999834060669 |
| topics[0].domain.id | https://openalex.org/domains/2 |
| topics[0].domain.display_name | Social Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1803 |
| topics[0].subfield.display_name | Management Science and Operations Research |
| topics[0].display_name | Advanced Bandit Algorithms Research |
| topics[1].id | https://openalex.org/T10462 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9983999729156494 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1702 |
| topics[1].subfield.display_name | Artificial Intelligence |
| topics[1].display_name | Reinforcement Learning in Robotics |
| topics[2].id | https://openalex.org/T12072 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.9909999966621399 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1702 |
| topics[2].subfield.display_name | Artificial Intelligence |
| topics[2].display_name | Machine Learning and Algorithms |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C50817715 |
| concepts[0].level | 2 |
| concepts[0].score | 0.7074787020683289 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q79895177 |
| concepts[0].display_name | Regret |
| concepts[1].id | https://openalex.org/C39927690 |
| concepts[1].level | 2 |
| concepts[1].score | 0.659284770488739 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q11197 |
| concepts[1].display_name | Logarithm |
| concepts[2].id | https://openalex.org/C114614502 |
| concepts[2].level | 1 |
| concepts[2].score | 0.6382050514221191 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q76592 |
| concepts[2].display_name | Combinatorics |
| concepts[3].id | https://openalex.org/C77553402 |
| concepts[3].level | 2 |
| concepts[3].score | 0.6331272125244141 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q13222579 |
| concepts[3].display_name | Upper and lower bounds |
| concepts[4].id | https://openalex.org/C99844830 |
| concepts[4].level | 2 |
| concepts[4].score | 0.5872185826301575 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q102441924 |
| concepts[4].display_name | Scaling |
| concepts[5].id | https://openalex.org/C149728462 |
| concepts[5].level | 2 |
| concepts[5].score | 0.5683568716049194 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q751319 |
| concepts[5].display_name | Minimax |
| concepts[6].id | https://openalex.org/C97541855 |
| concepts[6].level | 2 |
| concepts[6].score | 0.5659250617027283 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q830687 |
| concepts[6].display_name | Reinforcement learning |
| concepts[7].id | https://openalex.org/C2779557605 |
| concepts[7].level | 2 |
| concepts[7].score | 0.5628350377082825 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q9890 |
| concepts[7].display_name | Omega |
| concepts[8].id | https://openalex.org/C33923547 |
| concepts[8].level | 0 |
| concepts[8].score | 0.5451778769493103 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q395 |
| concepts[8].display_name | Mathematics |
| concepts[9].id | https://openalex.org/C159176650 |
| concepts[9].level | 2 |
| concepts[9].score | 0.5010685920715332 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q43261 |
| concepts[9].display_name | Horizon |
| concepts[10].id | https://openalex.org/C14036430 |
| concepts[10].level | 2 |
| concepts[10].score | 0.46781039237976074 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q3736076 |
| concepts[10].display_name | Function (biology) |
| concepts[11].id | https://openalex.org/C26517878 |
| concepts[11].level | 2 |
| concepts[11].score | 0.44248804450035095 |
| concepts[11].wikidata | https://www.wikidata.org/wiki/Q228039 |
| concepts[11].display_name | Key (lock) |
| concepts[12].id | https://openalex.org/C118615104 |
| concepts[12].level | 1 |
| concepts[12].score | 0.38671889901161194 |
| concepts[12].wikidata | https://www.wikidata.org/wiki/Q121416 |
| concepts[12].display_name | Discrete mathematics |
| concepts[13].id | https://openalex.org/C41008148 |
| concepts[13].level | 0 |
| concepts[13].score | 0.19477537274360657 |
| concepts[13].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[13].display_name | Computer science |
| concepts[14].id | https://openalex.org/C126255220 |
| concepts[14].level | 1 |
| concepts[14].score | 0.1728718876838684 |
| concepts[14].wikidata | https://www.wikidata.org/wiki/Q141495 |
| concepts[14].display_name | Mathematical optimization |
| concepts[15].id | https://openalex.org/C121332964 |
| concepts[15].level | 0 |
| concepts[15].score | 0.14904409646987915 |
| concepts[15].wikidata | https://www.wikidata.org/wiki/Q413 |
| concepts[15].display_name | Physics |
| concepts[16].id | https://openalex.org/C105795698 |
| concepts[16].level | 1 |
| concepts[16].score | 0.11103245615959167 |
| concepts[16].wikidata | https://www.wikidata.org/wiki/Q12483 |
| concepts[16].display_name | Statistics |
| concepts[17].id | https://openalex.org/C119857082 |
| concepts[17].level | 1 |
| concepts[17].score | 0.07832729816436768 |
| concepts[17].wikidata | https://www.wikidata.org/wiki/Q2539 |
| concepts[17].display_name | Machine learning |
| concepts[18].id | https://openalex.org/C134306372 |
| concepts[18].level | 1 |
| concepts[18].score | 0.05842462182044983 |
| concepts[18].wikidata | https://www.wikidata.org/wiki/Q7754 |
| concepts[18].display_name | Mathematical analysis |
| concepts[19].id | https://openalex.org/C62520636 |
| concepts[19].level | 1 |
| concepts[19].score | 0.053871095180511475 |
| concepts[19].wikidata | https://www.wikidata.org/wiki/Q944 |
| concepts[19].display_name | Quantum mechanics |
| concepts[20].id | https://openalex.org/C38652104 |
| concepts[20].level | 1 |
| concepts[20].score | 0.0 |
| concepts[20].wikidata | https://www.wikidata.org/wiki/Q3510521 |
| concepts[20].display_name | Computer security |
| concepts[21].id | https://openalex.org/C2524010 |
| concepts[21].level | 1 |
| concepts[21].score | 0.0 |
| concepts[21].wikidata | https://www.wikidata.org/wiki/Q8087 |
| concepts[21].display_name | Geometry |
| concepts[22].id | https://openalex.org/C86803240 |
| concepts[22].level | 0 |
| concepts[22].score | 0.0 |
| concepts[22].wikidata | https://www.wikidata.org/wiki/Q420 |
| concepts[22].display_name | Biology |
| concepts[23].id | https://openalex.org/C78458016 |
| concepts[23].level | 1 |
| concepts[23].score | 0.0 |
| concepts[23].wikidata | https://www.wikidata.org/wiki/Q840400 |
| concepts[23].display_name | Evolutionary biology |
| keywords[0].id | https://openalex.org/keywords/regret |
| keywords[0].score | 0.7074787020683289 |
| keywords[0].display_name | Regret |
| keywords[1].id | https://openalex.org/keywords/logarithm |
| keywords[1].score | 0.659284770488739 |
| keywords[1].display_name | Logarithm |
| keywords[2].id | https://openalex.org/keywords/combinatorics |
| keywords[2].score | 0.6382050514221191 |
| keywords[2].display_name | Combinatorics |
| keywords[3].id | https://openalex.org/keywords/upper-and-lower-bounds |
| keywords[3].score | 0.6331272125244141 |
| keywords[3].display_name | Upper and lower bounds |
| keywords[4].id | https://openalex.org/keywords/scaling |
| keywords[4].score | 0.5872185826301575 |
| keywords[4].display_name | Scaling |
| keywords[5].id | https://openalex.org/keywords/minimax |
| keywords[5].score | 0.5683568716049194 |
| keywords[5].display_name | Minimax |
| keywords[6].id | https://openalex.org/keywords/reinforcement-learning |
| keywords[6].score | 0.5659250617027283 |
| keywords[6].display_name | Reinforcement learning |
| keywords[7].id | https://openalex.org/keywords/omega |
| keywords[7].score | 0.5628350377082825 |
| keywords[7].display_name | Omega |
| keywords[8].id | https://openalex.org/keywords/mathematics |
| keywords[8].score | 0.5451778769493103 |
| keywords[8].display_name | Mathematics |
| keywords[9].id | https://openalex.org/keywords/horizon |
| keywords[9].score | 0.5010685920715332 |
| keywords[9].display_name | Horizon |
| keywords[10].id | https://openalex.org/keywords/function |
| keywords[10].score | 0.46781039237976074 |
| keywords[10].display_name | Function (biology) |
| keywords[11].id | https://openalex.org/keywords/key |
| keywords[11].score | 0.44248804450035095 |
| keywords[11].display_name | Key (lock) |
| keywords[12].id | https://openalex.org/keywords/discrete-mathematics |
| keywords[12].score | 0.38671889901161194 |
| keywords[12].display_name | Discrete mathematics |
| keywords[13].id | https://openalex.org/keywords/computer-science |
| keywords[13].score | 0.19477537274360657 |
| keywords[13].display_name | Computer science |
| keywords[14].id | https://openalex.org/keywords/mathematical-optimization |
| keywords[14].score | 0.1728718876838684 |
| keywords[14].display_name | Mathematical optimization |
| keywords[15].id | https://openalex.org/keywords/physics |
| keywords[15].score | 0.14904409646987915 |
| keywords[15].display_name | Physics |
| keywords[16].id | https://openalex.org/keywords/statistics |
| keywords[16].score | 0.11103245615959167 |
| keywords[16].display_name | Statistics |
| keywords[17].id | https://openalex.org/keywords/machine-learning |
| keywords[17].score | 0.07832729816436768 |
| keywords[17].display_name | Machine learning |
| keywords[18].id | https://openalex.org/keywords/mathematical-analysis |
| keywords[18].score | 0.05842462182044983 |
| keywords[18].display_name | Mathematical analysis |
| keywords[19].id | https://openalex.org/keywords/quantum-mechanics |
| keywords[19].score | 0.053871095180511475 |
| keywords[19].display_name | Quantum mechanics |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:1703.05449 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/1703.05449 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/1703.05449 |
| locations[1].id | mag:2604884452 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | submittedVersion |
| locations[1].raw_type | |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | False |
| locations[1].raw_source_name | arXiv (Cornell University) |
| locations[1].landing_page_url | https://arxiv.org/pdf/1703.05449.pdf |
| locations[2].id | doi:10.48550/arxiv.1703.05449 |
| locations[2].is_oa | True |
| locations[2].source.id | https://openalex.org/S4306400194 |
| locations[2].source.issn | |
| locations[2].source.type | repository |
| locations[2].source.is_oa | True |
| locations[2].source.issn_l | |
| locations[2].source.is_core | False |
| locations[2].source.is_in_doaj | False |
| locations[2].source.display_name | arXiv (Cornell University) |
| locations[2].source.host_organization | https://openalex.org/I205783295 |
| locations[2].source.host_organization_name | Cornell University |
| locations[2].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[2].license | |
| locations[2].pdf_url | |
| locations[2].version | |
| locations[2].raw_type | article |
| locations[2].license_id | |
| locations[2].is_accepted | False |
| locations[2].is_published | |
| locations[2].raw_source_name | |
| locations[2].landing_page_url | https://doi.org/10.48550/arxiv.1703.05449 |
| locations[3].id | mag:2964054583 |
| locations[3].is_oa | False |
| locations[3].source.id | https://openalex.org/S4306419644 |
| locations[3].source.issn | |
| locations[3].source.type | conference |
| locations[3].source.is_oa | False |
| locations[3].source.issn_l | |
| locations[3].source.is_core | False |
| locations[3].source.is_in_doaj | False |
| locations[3].source.display_name | International Conference on Machine Learning |
| locations[3].source.host_organization | |
| locations[3].source.host_organization_name | |
| locations[3].license | |
| locations[3].pdf_url | |
| locations[3].version | |
| locations[3].raw_type | |
| locations[3].license_id | |
| locations[3].is_accepted | False |
| locations[3].is_published | |
| locations[3].raw_source_name | International Conference on Machine Learning |
| locations[3].landing_page_url | http://proceedings.mlr.press/v70/azar17a/azar17a.pdf |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5043355670 |
| authorships[0].author.orcid | |
| authorships[0].author.display_name | Mohammad Gheshlaghi Azar |
| authorships[0].countries | GB |
| authorships[0].affiliations[0].institution_ids | https://openalex.org/I4210090411 |
| authorships[0].affiliations[0].raw_affiliation_string | DeepMind, London, UK#TAB# |
| authorships[0].institutions[0].id | https://openalex.org/I4210090411 |
| authorships[0].institutions[0].ror | https://ror.org/00971b260 |
| authorships[0].institutions[0].type | company |
| authorships[0].institutions[0].lineage | https://openalex.org/I4210090411, https://openalex.org/I4210128969 |
| authorships[0].institutions[0].country_code | GB |
| authorships[0].institutions[0].display_name | DeepMind (United Kingdom) |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Mohammad Gheshlaghi Azar |
| authorships[0].is_corresponding | False |
| authorships[0].raw_affiliation_strings | DeepMind, London, UK#TAB# |
| authorships[1].author.id | https://openalex.org/A5015899120 |
| authorships[1].author.orcid | |
| authorships[1].author.display_name | Ian Osband |
| authorships[1].countries | GB |
| authorships[1].affiliations[0].institution_ids | https://openalex.org/I4210090411 |
| authorships[1].affiliations[0].raw_affiliation_string | DeepMind, London, UK#TAB# |
| authorships[1].institutions[0].id | https://openalex.org/I4210090411 |
| authorships[1].institutions[0].ror | https://ror.org/00971b260 |
| authorships[1].institutions[0].type | company |
| authorships[1].institutions[0].lineage | https://openalex.org/I4210090411, https://openalex.org/I4210128969 |
| authorships[1].institutions[0].country_code | GB |
| authorships[1].institutions[0].display_name | DeepMind (United Kingdom) |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Ian Osband |
| authorships[1].is_corresponding | False |
| authorships[1].raw_affiliation_strings | DeepMind, London, UK#TAB# |
| authorships[2].author.id | https://openalex.org/A5006533777 |
| authorships[2].author.orcid | |
| authorships[2].author.display_name | Rémi Munos |
| authorships[2].countries | GB |
| authorships[2].affiliations[0].institution_ids | https://openalex.org/I4210090411 |
| authorships[2].affiliations[0].raw_affiliation_string | DeepMind, London, UK#TAB# |
| authorships[2].institutions[0].id | https://openalex.org/I4210090411 |
| authorships[2].institutions[0].ror | https://ror.org/00971b260 |
| authorships[2].institutions[0].type | company |
| authorships[2].institutions[0].lineage | https://openalex.org/I4210090411, https://openalex.org/I4210128969 |
| authorships[2].institutions[0].country_code | GB |
| authorships[2].institutions[0].display_name | DeepMind (United Kingdom) |
| authorships[2].author_position | last |
| authorships[2].raw_author_name | Rémi Munos |
| authorships[2].is_corresponding | False |
| authorships[2].raw_affiliation_strings | DeepMind, London, UK#TAB# |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/1703.05449 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Minimax Regret Bounds for Reinforcement Learning |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T12101 |
| primary_topic.field.id | https://openalex.org/fields/18 |
| primary_topic.field.display_name | Decision Sciences |
| primary_topic.score | 0.9998999834060669 |
| primary_topic.domain.id | https://openalex.org/domains/2 |
| primary_topic.domain.display_name | Social Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1803 |
| primary_topic.subfield.display_name | Management Science and Operations Research |
| primary_topic.display_name | Advanced Bandit Algorithms Research |
| related_works | https://openalex.org/W1850488217, https://openalex.org/W2119567691, https://openalex.org/W2121863487, https://openalex.org/W2963049774, https://openalex.org/W2145339207, https://openalex.org/W2907502549, https://openalex.org/W3046395471, https://openalex.org/W2257979135, https://openalex.org/W2119738618, https://openalex.org/W107583932, https://openalex.org/W2766447205, https://openalex.org/W2129670787, https://openalex.org/W2964000194, https://openalex.org/W2949608212, https://openalex.org/W2944264312, https://openalex.org/W2769648743, https://openalex.org/W2489939061, https://openalex.org/W2168405694, https://openalex.org/W1969276875, https://openalex.org/W1757796397 |
| cited_by_count | 50 |
| counts_by_year[0].year | 2025 |
| counts_by_year[0].cited_by_count | 1 |
| counts_by_year[1].year | 2024 |
| counts_by_year[1].cited_by_count | 2 |
| counts_by_year[2].year | 2023 |
| counts_by_year[2].cited_by_count | 2 |
| counts_by_year[3].year | 2022 |
| counts_by_year[3].cited_by_count | 3 |
| counts_by_year[4].year | 2021 |
| counts_by_year[4].cited_by_count | 13 |
| counts_by_year[5].year | 2020 |
| counts_by_year[5].cited_by_count | 14 |
| counts_by_year[6].year | 2019 |
| counts_by_year[6].cited_by_count | 10 |
| counts_by_year[7].year | 2018 |
| counts_by_year[7].cited_by_count | 3 |
| counts_by_year[8].year | 2017 |
| counts_by_year[8].cited_by_count | 2 |
| locations_count | 4 |
| best_oa_location.id | pmh:oai:arXiv.org:1703.05449 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/1703.05449 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/1703.05449 |
| primary_location.id | pmh:oai:arXiv.org:1703.05449 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/1703.05449 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/1703.05449 |
| publication_date | 2017-03-16 |
| publication_year | 2017 |
| referenced_works | https://openalex.org/W1570963478, https://openalex.org/W2122701159, https://openalex.org/W2750990725, https://openalex.org/W2126163471, https://openalex.org/W21934178, https://openalex.org/W1576452626, https://openalex.org/W1505937442, https://openalex.org/W2489939061, https://openalex.org/W2083459869, https://openalex.org/W2049934117, https://openalex.org/W1786332878, https://openalex.org/W1998376807, https://openalex.org/W2120678009, https://openalex.org/W2120090487, https://openalex.org/W2073107347, https://openalex.org/W2312609093, https://openalex.org/W1850488217, https://openalex.org/W1988526405, https://openalex.org/W2039522160, https://openalex.org/W2121863487 |
| referenced_works_count | 20 |
| abstract_inverted_index.+ | 31 |
| abstract_inverted_index.a | 25, 94, 108, 130 |
| abstract_inverted_index.We | 0, 15, 117 |
| abstract_inverted_index.an | 18 |
| abstract_inverted_index.as | 129 |
| abstract_inverted_index.at | 158 |
| abstract_inverted_index.by | 67 |
| abstract_inverted_index.et | 73 |
| abstract_inverted_index.in | 8, 141, 165 |
| abstract_inverted_index.is | 35, 83 |
| abstract_inverted_index.it | 91 |
| abstract_inverted_index.of | 4, 28, 42, 47, 53, 71, 79, 96, 104, 121, 154 |
| abstract_inverted_index.to | 21, 93, 107, 124, 134 |
| abstract_inverted_index.up | 106 |
| abstract_inverted_index.we | 144 |
| abstract_inverted_index.$A$ | 44 |
| abstract_inverted_index.$H$ | 34 |
| abstract_inverted_index.$S$ | 39 |
| abstract_inverted_index.$T$ | 50 |
| abstract_inverted_index.(to | 138, 162 |
| abstract_inverted_index.H$, | 90 |
| abstract_inverted_index.Our | 111 |
| abstract_inverted_index.The | 76 |
| abstract_inverted_index.and | 49, 88, 143 |
| abstract_inverted_index.for | 11 |
| abstract_inverted_index.key | 77, 115 |
| abstract_inverted_index.new | 81 |
| abstract_inverted_index.our | 80 |
| abstract_inverted_index.the | 2, 36, 40, 45, 51, 59, 68, 100, 125, 135, 151, 155, 159 |
| abstract_inverted_index.two | 114 |
| abstract_inverted_index.use | 118, 150 |
| abstract_inverted_index.This | 55 |
| abstract_inverted_index.al., | 74 |
| abstract_inverted_index.best | 60 |
| abstract_inverted_index.next | 160 |
| abstract_inverted_index.over | 58 |
| abstract_inverted_index.show | 16 |
| abstract_inverted_index.than | 133 |
| abstract_inverted_index.that | 17, 84, 98, 149 |
| abstract_inverted_index.time | 37 |
| abstract_inverted_index.when | 85 |
| abstract_inverted_index.$H$). | 166 |
| abstract_inverted_index.$S$), | 142 |
| abstract_inverted_index.2010. | 75 |
| abstract_inverted_index.MDPs. | 14 |
| abstract_inverted_index.UCRL2 | 69 |
| abstract_inverted_index.bound | 27, 63, 103 |
| abstract_inverted_index.known | 62 |
| abstract_inverted_index.leads | 92 |
| abstract_inverted_index.lower | 102 |
| abstract_inverted_index.value | 22, 127 |
| abstract_inverted_index.where | 33 |
| abstract_inverted_index.$T\geq | 86 |
| abstract_inverted_index.Jaksch | 72 |
| abstract_inverted_index.define | 145 |
| abstract_inverted_index.finite | 12 |
| abstract_inverted_index.number | 41, 46, 52 |
| abstract_inverted_index.rather | 132 |
| abstract_inverted_index.regret | 26, 95 |
| abstract_inverted_index.result | 56 |
| abstract_inverted_index.states | 161 |
| abstract_inverted_index.values | 157 |
| abstract_inverted_index.whole, | 131 |
| abstract_inverted_index.$SA\geq | 89 |
| abstract_inverted_index.actions | 48 |
| abstract_inverted_index.careful | 119 |
| abstract_inverted_index.factor. | 110 |
| abstract_inverted_index.horizon | 13 |
| abstract_inverted_index.improve | 139, 163 |
| abstract_inverted_index.matches | 99 |
| abstract_inverted_index.optimal | 6, 126 |
| abstract_inverted_index.problem | 3 |
| abstract_inverted_index.results | 82 |
| abstract_inverted_index.scaling | 140, 164 |
| abstract_inverted_index.states, | 43 |
| abstract_inverted_index.H^3S^3A$ | 87 |
| abstract_inverted_index.achieved | 66 |
| abstract_inverted_index.achieves | 24 |
| abstract_inverted_index.analysis | 112 |
| abstract_inverted_index.bonuses" | 148 |
| abstract_inverted_index.consider | 1 |
| abstract_inverted_index.contains | 113 |
| abstract_inverted_index.function | 128 |
| abstract_inverted_index.horizon, | 38 |
| abstract_inverted_index.improves | 57 |
| abstract_inverted_index.learning | 10 |
| abstract_inverted_index.previous | 61 |
| abstract_inverted_index.provably | 5 |
| abstract_inverted_index.variance | 153 |
| abstract_inverted_index.algorithm | 70 |
| abstract_inverted_index.empirical | 152 |
| abstract_inverted_index.estimated | 156 |
| abstract_inverted_index.insights. | 116 |
| abstract_inverted_index.iteration | 23 |
| abstract_inverted_index.optimistic | 19 |
| abstract_inverted_index.$\tilde{O}( | 29 |
| abstract_inverted_index.\sqrt{AT})$ | 65 |
| abstract_inverted_index.\sqrt{HSAT} | 30 |
| abstract_inverted_index.application | 120 |
| abstract_inverted_index.established | 101 |
| abstract_inverted_index.exploration | 7 |
| abstract_inverted_index.logarithmic | 109 |
| abstract_inverted_index.time-steps. | 54 |
| abstract_inverted_index.transitions | 136 |
| abstract_inverted_index."exploration | 147 |
| abstract_inverted_index.inequalities | 123 |
| abstract_inverted_index.modification | 20 |
| abstract_inverted_index.significance | 78 |
| abstract_inverted_index.$\tilde{O}(HS | 64 |
| abstract_inverted_index.concentration | 122 |
| abstract_inverted_index.probabilities | 137 |
| abstract_inverted_index.reinforcement | 9 |
| abstract_inverted_index.Bernstein-based | 146 |
| abstract_inverted_index.$Ω(\sqrt{HSAT})$ | 105 |
| abstract_inverted_index.H^2S^2A+H\sqrt{T})$ | 32 |
| abstract_inverted_index.$\tilde{O}(\sqrt{HSAT})$ | 97 |
| cited_by_percentile_year | |
| countries_distinct_count | 1 |
| institutions_distinct_count | 3 |
| citation_normalized_percentile |