On Optimistic versus Randomized Exploration in Reinforcement Learning Article Swipe
YOU?
·
· 2017
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.1706.04241
We discuss the relative merits of optimistic and randomized approaches to exploration in reinforcement learning. Optimistic approaches presented in the literature apply an optimistic boost to the value estimate at each state-action pair and select actions that are greedy with respect to the resulting optimistic value function. Randomized approaches sample from among statistically plausible value functions and select actions that are greedy with respect to the random sample. Prior computational experience suggests that randomized approaches can lead to far more statistically efficient learning. We present two simple analytic examples that elucidate why this is the case. In principle, there should be optimistic approaches that fare well relative to randomized approaches, but that would require intractable computation. Optimistic approaches that have been proposed in the literature sacrifice statistical efficiency for the sake of computational efficiency. Randomized approaches, on the other hand, may enable simultaneous statistical and computational efficiency.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/1706.04241
- https://arxiv.org/pdf/1706.04241
- OA Status
- green
- Cited By
- 9
- References
- 2
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W2625705959
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W2625705959Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.1706.04241Digital Object Identifier
- Title
-
On Optimistic versus Randomized Exploration in Reinforcement LearningWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2017Year of publication
- Publication date
-
2017-06-13Full publication date if available
- Authors
-
Ian Osband, Benjamin Van RoyList of authors in order
- Landing page
-
https://arxiv.org/abs/1706.04241Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/1706.04241Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/1706.04241Direct OA link when available
- Concepts
-
Reinforcement learning, Computer science, Artificial intelligence, Randomized experiment, Bellman equation, Randomized algorithm, Machine learning, Randomized controlled trial, Value (mathematics), Sample (material), Mathematical optimization, Mathematics, Statistics, Algorithm, Chemistry, Medicine, Chromatography, SurgeryTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
9Total citation count in OpenAlex
- Citations by year (recent)
-
2021: 1, 2020: 3, 2019: 1, 2018: 3, 2017: 1Per-year citation counts (last 5 years)
- References (count)
-
2Number of works referenced by this work
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W2625705959 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.1706.04241 |
| ids.doi | https://doi.org/10.48550/arxiv.1706.04241 |
| ids.mag | 2625705959 |
| ids.openalex | https://openalex.org/W2625705959 |
| fwci | |
| type | preprint |
| title | On Optimistic versus Randomized Exploration in Reinforcement Learning |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T11975 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9984999895095825 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1702 |
| topics[0].subfield.display_name | Artificial Intelligence |
| topics[0].display_name | Evolutionary Algorithms and Applications |
| topics[1].id | https://openalex.org/T10848 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9943000078201294 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1703 |
| topics[1].subfield.display_name | Computational Theory and Mathematics |
| topics[1].display_name | Advanced Multi-Objective Optimization Algorithms |
| topics[2].id | https://openalex.org/T12101 |
| topics[2].field.id | https://openalex.org/fields/18 |
| topics[2].field.display_name | Decision Sciences |
| topics[2].score | 0.9926000237464905 |
| topics[2].domain.id | https://openalex.org/domains/2 |
| topics[2].domain.display_name | Social Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1803 |
| topics[2].subfield.display_name | Management Science and Operations Research |
| topics[2].display_name | Advanced Bandit Algorithms Research |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C97541855 |
| concepts[0].level | 2 |
| concepts[0].score | 0.7793627977371216 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q830687 |
| concepts[0].display_name | Reinforcement learning |
| concepts[1].id | https://openalex.org/C41008148 |
| concepts[1].level | 0 |
| concepts[1].score | 0.6636861562728882 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[1].display_name | Computer science |
| concepts[2].id | https://openalex.org/C154945302 |
| concepts[2].level | 1 |
| concepts[2].score | 0.5143705606460571 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[2].display_name | Artificial intelligence |
| concepts[3].id | https://openalex.org/C155108698 |
| concepts[3].level | 2 |
| concepts[3].score | 0.5076652765274048 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q1231081 |
| concepts[3].display_name | Randomized experiment |
| concepts[4].id | https://openalex.org/C14646407 |
| concepts[4].level | 2 |
| concepts[4].score | 0.480546772480011 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q1430750 |
| concepts[4].display_name | Bellman equation |
| concepts[5].id | https://openalex.org/C128669082 |
| concepts[5].level | 2 |
| concepts[5].score | 0.47487255930900574 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q583461 |
| concepts[5].display_name | Randomized algorithm |
| concepts[6].id | https://openalex.org/C119857082 |
| concepts[6].level | 1 |
| concepts[6].score | 0.4628215730190277 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q2539 |
| concepts[6].display_name | Machine learning |
| concepts[7].id | https://openalex.org/C168563851 |
| concepts[7].level | 2 |
| concepts[7].score | 0.45829707384109497 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q1436668 |
| concepts[7].display_name | Randomized controlled trial |
| concepts[8].id | https://openalex.org/C2776291640 |
| concepts[8].level | 2 |
| concepts[8].score | 0.4566165804862976 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q2912517 |
| concepts[8].display_name | Value (mathematics) |
| concepts[9].id | https://openalex.org/C198531522 |
| concepts[9].level | 2 |
| concepts[9].score | 0.4205012917518616 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q485146 |
| concepts[9].display_name | Sample (material) |
| concepts[10].id | https://openalex.org/C126255220 |
| concepts[10].level | 1 |
| concepts[10].score | 0.3413822054862976 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q141495 |
| concepts[10].display_name | Mathematical optimization |
| concepts[11].id | https://openalex.org/C33923547 |
| concepts[11].level | 0 |
| concepts[11].score | 0.22781258821487427 |
| concepts[11].wikidata | https://www.wikidata.org/wiki/Q395 |
| concepts[11].display_name | Mathematics |
| concepts[12].id | https://openalex.org/C105795698 |
| concepts[12].level | 1 |
| concepts[12].score | 0.1854797601699829 |
| concepts[12].wikidata | https://www.wikidata.org/wiki/Q12483 |
| concepts[12].display_name | Statistics |
| concepts[13].id | https://openalex.org/C11413529 |
| concepts[13].level | 1 |
| concepts[13].score | 0.1613936722278595 |
| concepts[13].wikidata | https://www.wikidata.org/wiki/Q8366 |
| concepts[13].display_name | Algorithm |
| concepts[14].id | https://openalex.org/C185592680 |
| concepts[14].level | 0 |
| concepts[14].score | 0.0 |
| concepts[14].wikidata | https://www.wikidata.org/wiki/Q2329 |
| concepts[14].display_name | Chemistry |
| concepts[15].id | https://openalex.org/C71924100 |
| concepts[15].level | 0 |
| concepts[15].score | 0.0 |
| concepts[15].wikidata | https://www.wikidata.org/wiki/Q11190 |
| concepts[15].display_name | Medicine |
| concepts[16].id | https://openalex.org/C43617362 |
| concepts[16].level | 1 |
| concepts[16].score | 0.0 |
| concepts[16].wikidata | https://www.wikidata.org/wiki/Q170050 |
| concepts[16].display_name | Chromatography |
| concepts[17].id | https://openalex.org/C141071460 |
| concepts[17].level | 1 |
| concepts[17].score | 0.0 |
| concepts[17].wikidata | https://www.wikidata.org/wiki/Q40821 |
| concepts[17].display_name | Surgery |
| keywords[0].id | https://openalex.org/keywords/reinforcement-learning |
| keywords[0].score | 0.7793627977371216 |
| keywords[0].display_name | Reinforcement learning |
| keywords[1].id | https://openalex.org/keywords/computer-science |
| keywords[1].score | 0.6636861562728882 |
| keywords[1].display_name | Computer science |
| keywords[2].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[2].score | 0.5143705606460571 |
| keywords[2].display_name | Artificial intelligence |
| keywords[3].id | https://openalex.org/keywords/randomized-experiment |
| keywords[3].score | 0.5076652765274048 |
| keywords[3].display_name | Randomized experiment |
| keywords[4].id | https://openalex.org/keywords/bellman-equation |
| keywords[4].score | 0.480546772480011 |
| keywords[4].display_name | Bellman equation |
| keywords[5].id | https://openalex.org/keywords/randomized-algorithm |
| keywords[5].score | 0.47487255930900574 |
| keywords[5].display_name | Randomized algorithm |
| keywords[6].id | https://openalex.org/keywords/machine-learning |
| keywords[6].score | 0.4628215730190277 |
| keywords[6].display_name | Machine learning |
| keywords[7].id | https://openalex.org/keywords/randomized-controlled-trial |
| keywords[7].score | 0.45829707384109497 |
| keywords[7].display_name | Randomized controlled trial |
| keywords[8].id | https://openalex.org/keywords/value |
| keywords[8].score | 0.4566165804862976 |
| keywords[8].display_name | Value (mathematics) |
| keywords[9].id | https://openalex.org/keywords/sample |
| keywords[9].score | 0.4205012917518616 |
| keywords[9].display_name | Sample (material) |
| keywords[10].id | https://openalex.org/keywords/mathematical-optimization |
| keywords[10].score | 0.3413822054862976 |
| keywords[10].display_name | Mathematical optimization |
| keywords[11].id | https://openalex.org/keywords/mathematics |
| keywords[11].score | 0.22781258821487427 |
| keywords[11].display_name | Mathematics |
| keywords[12].id | https://openalex.org/keywords/statistics |
| keywords[12].score | 0.1854797601699829 |
| keywords[12].display_name | Statistics |
| keywords[13].id | https://openalex.org/keywords/algorithm |
| keywords[13].score | 0.1613936722278595 |
| keywords[13].display_name | Algorithm |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:1706.04241 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/1706.04241 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/1706.04241 |
| locations[1].id | doi:10.48550/arxiv.1706.04241 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.1706.04241 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5015899120 |
| authorships[0].author.orcid | |
| authorships[0].author.display_name | Ian Osband |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Ian Osband |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5045543562 |
| authorships[1].author.orcid | https://orcid.org/0000-0001-8364-3746 |
| authorships[1].author.display_name | Benjamin Van Roy |
| authorships[1].author_position | last |
| authorships[1].raw_author_name | Benjamin Van Roy |
| authorships[1].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/1706.04241 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | On Optimistic versus Randomized Exploration in Reinforcement Learning |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T11975 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9984999895095825 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1702 |
| primary_topic.subfield.display_name | Artificial Intelligence |
| primary_topic.display_name | Evolutionary Algorithms and Applications |
| related_works | https://openalex.org/W4306904969, https://openalex.org/W2138720691, https://openalex.org/W4362501864, https://openalex.org/W4380318855, https://openalex.org/W2031695474, https://openalex.org/W2024136090, https://openalex.org/W2386410636, https://openalex.org/W3038962357, https://openalex.org/W2025663273, https://openalex.org/W3099153698 |
| cited_by_count | 9 |
| counts_by_year[0].year | 2021 |
| counts_by_year[0].cited_by_count | 1 |
| counts_by_year[1].year | 2020 |
| counts_by_year[1].cited_by_count | 3 |
| counts_by_year[2].year | 2019 |
| counts_by_year[2].cited_by_count | 1 |
| counts_by_year[3].year | 2018 |
| counts_by_year[3].cited_by_count | 3 |
| counts_by_year[4].year | 2017 |
| counts_by_year[4].cited_by_count | 1 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:1706.04241 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/1706.04241 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/1706.04241 |
| primary_location.id | pmh:oai:arXiv.org:1706.04241 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/1706.04241 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/1706.04241 |
| publication_date | 2017-06-13 |
| publication_year | 2017 |
| referenced_works | https://openalex.org/W2949475445, https://openalex.org/W2149721706 |
| referenced_works_count | 2 |
| abstract_inverted_index.In | 96 |
| abstract_inverted_index.We | 0, 83 |
| abstract_inverted_index.an | 22 |
| abstract_inverted_index.at | 29 |
| abstract_inverted_index.be | 100 |
| abstract_inverted_index.in | 12, 18, 122 |
| abstract_inverted_index.is | 93 |
| abstract_inverted_index.of | 5, 131 |
| abstract_inverted_index.on | 136 |
| abstract_inverted_index.to | 10, 25, 41, 64, 77, 107 |
| abstract_inverted_index.and | 7, 33, 56, 144 |
| abstract_inverted_index.are | 37, 60 |
| abstract_inverted_index.but | 110 |
| abstract_inverted_index.can | 75 |
| abstract_inverted_index.far | 78 |
| abstract_inverted_index.for | 128 |
| abstract_inverted_index.may | 140 |
| abstract_inverted_index.the | 2, 19, 26, 42, 65, 94, 123, 129, 137 |
| abstract_inverted_index.two | 85 |
| abstract_inverted_index.why | 91 |
| abstract_inverted_index.been | 120 |
| abstract_inverted_index.each | 30 |
| abstract_inverted_index.fare | 104 |
| abstract_inverted_index.from | 50 |
| abstract_inverted_index.have | 119 |
| abstract_inverted_index.lead | 76 |
| abstract_inverted_index.more | 79 |
| abstract_inverted_index.pair | 32 |
| abstract_inverted_index.sake | 130 |
| abstract_inverted_index.that | 36, 59, 72, 89, 103, 111, 118 |
| abstract_inverted_index.this | 92 |
| abstract_inverted_index.well | 105 |
| abstract_inverted_index.with | 39, 62 |
| abstract_inverted_index.Prior | 68 |
| abstract_inverted_index.among | 51 |
| abstract_inverted_index.apply | 21 |
| abstract_inverted_index.boost | 24 |
| abstract_inverted_index.case. | 95 |
| abstract_inverted_index.hand, | 139 |
| abstract_inverted_index.other | 138 |
| abstract_inverted_index.there | 98 |
| abstract_inverted_index.value | 27, 45, 54 |
| abstract_inverted_index.would | 112 |
| abstract_inverted_index.enable | 141 |
| abstract_inverted_index.greedy | 38, 61 |
| abstract_inverted_index.merits | 4 |
| abstract_inverted_index.random | 66 |
| abstract_inverted_index.sample | 49 |
| abstract_inverted_index.select | 34, 57 |
| abstract_inverted_index.should | 99 |
| abstract_inverted_index.simple | 86 |
| abstract_inverted_index.actions | 35, 58 |
| abstract_inverted_index.discuss | 1 |
| abstract_inverted_index.present | 84 |
| abstract_inverted_index.require | 113 |
| abstract_inverted_index.respect | 40, 63 |
| abstract_inverted_index.sample. | 67 |
| abstract_inverted_index.analytic | 87 |
| abstract_inverted_index.estimate | 28 |
| abstract_inverted_index.examples | 88 |
| abstract_inverted_index.proposed | 121 |
| abstract_inverted_index.relative | 3, 106 |
| abstract_inverted_index.suggests | 71 |
| abstract_inverted_index.efficient | 81 |
| abstract_inverted_index.elucidate | 90 |
| abstract_inverted_index.function. | 46 |
| abstract_inverted_index.functions | 55 |
| abstract_inverted_index.learning. | 14, 82 |
| abstract_inverted_index.plausible | 53 |
| abstract_inverted_index.presented | 17 |
| abstract_inverted_index.resulting | 43 |
| abstract_inverted_index.sacrifice | 125 |
| abstract_inverted_index.Optimistic | 15, 116 |
| abstract_inverted_index.Randomized | 47, 134 |
| abstract_inverted_index.approaches | 9, 16, 48, 74, 102, 117 |
| abstract_inverted_index.efficiency | 127 |
| abstract_inverted_index.experience | 70 |
| abstract_inverted_index.literature | 20, 124 |
| abstract_inverted_index.optimistic | 6, 23, 44, 101 |
| abstract_inverted_index.principle, | 97 |
| abstract_inverted_index.randomized | 8, 73, 108 |
| abstract_inverted_index.approaches, | 109, 135 |
| abstract_inverted_index.efficiency. | 133, 146 |
| abstract_inverted_index.exploration | 11 |
| abstract_inverted_index.intractable | 114 |
| abstract_inverted_index.statistical | 126, 143 |
| abstract_inverted_index.computation. | 115 |
| abstract_inverted_index.simultaneous | 142 |
| abstract_inverted_index.state-action | 31 |
| abstract_inverted_index.computational | 69, 132, 145 |
| abstract_inverted_index.reinforcement | 13 |
| abstract_inverted_index.statistically | 52, 80 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 2 |
| citation_normalized_percentile |