On Optimistic versus Randomized Exploration in Reinforcement Learning Article Swipe

PDF

YOU? · · 2017 · Open Access · · DOI: https://doi.org/10.48550/arxiv.1706.04241

We discuss the relative merits of optimistic and randomized approaches to exploration in reinforcement learning. Optimistic approaches presented in the literature apply an optimistic boost to the value estimate at each state-action pair and select actions that are greedy with respect to the resulting optimistic value function. Randomized approaches sample from among statistically plausible value functions and select actions that are greedy with respect to the random sample. Prior computational experience suggests that randomized approaches can lead to far more statistically efficient learning. We present two simple analytic examples that elucidate why this is the case. In principle, there should be optimistic approaches that fare well relative to randomized approaches, but that would require intractable computation. Optimistic approaches that have been proposed in the literature sacrifice statistical efficiency for the sake of computational efficiency. Randomized approaches, on the other hand, may enable simultaneous statistical and computational efficiency.

Related Topics

Reinforcement Learning

Computer Science

Artificial Intelligence

Bellman Equation

Machine Learning

Randomized Controlled Trial

Concepts

Reinforcement learning Computer science Artificial intelligence Randomized experiment Bellman equation Randomized algorithm Machine learning Randomized controlled trial Value (mathematics) Sample (material) Mathematical optimization Mathematics Statistics Algorithm Chemistry Medicine Chromatography Surgery

Metadata

Type: preprint
Language: en
Landing Page: http://arxiv.org/abs/1706.04241
PDF: https://arxiv.org/pdf/1706.04241
OA Status: green
Cited By: 9
References: 2
Related Works: 10
OpenAlex ID: https://openalex.org/W2625705959

All OpenAlex metadata

Raw OpenAlex JSON

OpenAlex ID: https://openalex.org/W2625705959

Canonical identifier for this work in OpenAlex
DOI: https://doi.org/10.48550/arxiv.1706.04241

Digital Object Identifier
Title: On Optimistic versus Randomized Exploration in Reinforcement Learning

Work title
Type: preprint

OpenAlex work type
Language: en

Primary language
Publication year: 2017

Year of publication
Publication date: 2017-06-13

Full publication date if available
Authors: Ian Osband, Benjamin Van Roy

List of authors in order
Landing page: https://arxiv.org/abs/1706.04241

Publisher landing page
PDF URL: https://arxiv.org/pdf/1706.04241

Direct link to full text PDF
Open access: Yes

Whether a free full text is available
OA status: green

Open access status per OpenAlex
OA URL: https://arxiv.org/pdf/1706.04241

Direct OA link when available
Concepts: Reinforcement learning, Computer science, Artificial intelligence, Randomized experiment, Bellman equation, Randomized algorithm, Machine learning, Randomized controlled trial, Value (mathematics), Sample (material), Mathematical optimization, Mathematics, Statistics, Algorithm, Chemistry, Medicine, Chromatography, Surgery

Top concepts (fields/topics) attached by OpenAlex
Cited by: 9

Total citation count in OpenAlex
Citations by year (recent): 2021: 1, 2020: 3, 2019: 1, 2018: 3, 2017: 1

Per-year citation counts (last 5 years)
References (count): 2

Number of works referenced by this work
Related works (count): 10

Other works algorithmically related by OpenAlex

Full payload

id	https://openalex.org/W2625705959
doi	https://doi.org/10.48550/arxiv.1706.04241
ids.doi	https://doi.org/10.48550/arxiv.1706.04241
ids.mag	2625705959
ids.openalex	https://openalex.org/W2625705959
fwci
type	preprint
title	On Optimistic versus Randomized Exploration in Reinforcement Learning
biblio.issue
biblio.volume
biblio.last_page
biblio.first_page
topics[0].id	https://openalex.org/T11975
topics[0].field.id	https://openalex.org/fields/17
topics[0].field.display_name	Computer Science
topics[0].score	0.9984999895095825
topics[0].domain.id	https://openalex.org/domains/3
topics[0].domain.display_name	Physical Sciences
topics[0].subfield.id	https://openalex.org/subfields/1702
topics[0].subfield.display_name	Artificial Intelligence
topics[0].display_name	Evolutionary Algorithms and Applications
topics[1].id	https://openalex.org/T10848
topics[1].field.id	https://openalex.org/fields/17
topics[1].field.display_name	Computer Science
topics[1].score	0.9943000078201294
topics[1].domain.id	https://openalex.org/domains/3
topics[1].domain.display_name	Physical Sciences
topics[1].subfield.id	https://openalex.org/subfields/1703
topics[1].subfield.display_name	Computational Theory and Mathematics
topics[1].display_name	Advanced Multi-Objective Optimization Algorithms
topics[2].id	https://openalex.org/T12101
topics[2].field.id	https://openalex.org/fields/18
topics[2].field.display_name	Decision Sciences
topics[2].score	0.9926000237464905
topics[2].domain.id	https://openalex.org/domains/2
topics[2].domain.display_name	Social Sciences
topics[2].subfield.id	https://openalex.org/subfields/1803
topics[2].subfield.display_name	Management Science and Operations Research
topics[2].display_name	Advanced Bandit Algorithms Research
is_xpac	False
apc_list
apc_paid
concepts[0].id	https://openalex.org/C97541855
concepts[0].level	2
concepts[0].score	0.7793627977371216
concepts[0].wikidata	https://www.wikidata.org/wiki/Q830687
concepts[0].display_name	Reinforcement learning
concepts[1].id	https://openalex.org/C41008148
concepts[1].level	0
concepts[1].score	0.6636861562728882
concepts[1].wikidata	https://www.wikidata.org/wiki/Q21198
concepts[1].display_name	Computer science
concepts[2].id	https://openalex.org/C154945302
concepts[2].level	1
concepts[2].score	0.5143705606460571
concepts[2].wikidata	https://www.wikidata.org/wiki/Q11660
concepts[2].display_name	Artificial intelligence
concepts[3].id	https://openalex.org/C155108698
concepts[3].level	2
concepts[3].score	0.5076652765274048
concepts[3].wikidata	https://www.wikidata.org/wiki/Q1231081
concepts[3].display_name	Randomized experiment
concepts[4].id	https://openalex.org/C14646407
concepts[4].level	2
concepts[4].score	0.480546772480011
concepts[4].wikidata	https://www.wikidata.org/wiki/Q1430750
concepts[4].display_name	Bellman equation
concepts[5].id	https://openalex.org/C128669082
concepts[5].level	2
concepts[5].score	0.47487255930900574
concepts[5].wikidata	https://www.wikidata.org/wiki/Q583461
concepts[5].display_name	Randomized algorithm
concepts[6].id	https://openalex.org/C119857082
concepts[6].level	1
concepts[6].score	0.4628215730190277
concepts[6].wikidata	https://www.wikidata.org/wiki/Q2539
concepts[6].display_name	Machine learning
concepts[7].id	https://openalex.org/C168563851
concepts[7].level	2
concepts[7].score	0.45829707384109497
concepts[7].wikidata	https://www.wikidata.org/wiki/Q1436668
concepts[7].display_name	Randomized controlled trial
concepts[8].id	https://openalex.org/C2776291640
concepts[8].level	2
concepts[8].score	0.4566165804862976
concepts[8].wikidata	https://www.wikidata.org/wiki/Q2912517
concepts[8].display_name	Value (mathematics)
concepts[9].id	https://openalex.org/C198531522
concepts[9].level	2
concepts[9].score	0.4205012917518616
concepts[9].wikidata	https://www.wikidata.org/wiki/Q485146
concepts[9].display_name	Sample (material)
concepts[10].id	https://openalex.org/C126255220
concepts[10].level	1
concepts[10].score	0.3413822054862976
concepts[10].wikidata	https://www.wikidata.org/wiki/Q141495
concepts[10].display_name	Mathematical optimization
concepts[11].id	https://openalex.org/C33923547
concepts[11].level	0
concepts[11].score	0.22781258821487427
concepts[11].wikidata	https://www.wikidata.org/wiki/Q395
concepts[11].display_name	Mathematics
concepts[12].id	https://openalex.org/C105795698
concepts[12].level	1
concepts[12].score	0.1854797601699829
concepts[12].wikidata	https://www.wikidata.org/wiki/Q12483
concepts[12].display_name	Statistics
concepts[13].id	https://openalex.org/C11413529
concepts[13].level	1
concepts[13].score	0.1613936722278595
concepts[13].wikidata	https://www.wikidata.org/wiki/Q8366
concepts[13].display_name	Algorithm
concepts[14].id	https://openalex.org/C185592680
concepts[14].level	0
concepts[14].score	0.0
concepts[14].wikidata	https://www.wikidata.org/wiki/Q2329
concepts[14].display_name	Chemistry
concepts[15].id	https://openalex.org/C71924100
concepts[15].level	0
concepts[15].score	0.0
concepts[15].wikidata	https://www.wikidata.org/wiki/Q11190
concepts[15].display_name	Medicine
concepts[16].id	https://openalex.org/C43617362
concepts[16].level	1
concepts[16].score	0.0
concepts[16].wikidata	https://www.wikidata.org/wiki/Q170050
concepts[16].display_name	Chromatography
concepts[17].id	https://openalex.org/C141071460
concepts[17].level	1
concepts[17].score	0.0
concepts[17].wikidata	https://www.wikidata.org/wiki/Q40821
concepts[17].display_name	Surgery
keywords[0].id	https://openalex.org/keywords/reinforcement-learning
keywords[0].score	0.7793627977371216
keywords[0].display_name	Reinforcement learning
keywords[1].id	https://openalex.org/keywords/computer-science
keywords[1].score	0.6636861562728882
keywords[1].display_name	Computer science
keywords[2].id	https://openalex.org/keywords/artificial-intelligence
keywords[2].score	0.5143705606460571
keywords[2].display_name	Artificial intelligence
keywords[3].id	https://openalex.org/keywords/randomized-experiment
keywords[3].score	0.5076652765274048
keywords[3].display_name	Randomized experiment
keywords[4].id	https://openalex.org/keywords/bellman-equation
keywords[4].score	0.480546772480011
keywords[4].display_name	Bellman equation
keywords[5].id	https://openalex.org/keywords/randomized-algorithm
keywords[5].score	0.47487255930900574
keywords[5].display_name	Randomized algorithm
keywords[6].id	https://openalex.org/keywords/machine-learning
keywords[6].score	0.4628215730190277
keywords[6].display_name	Machine learning
keywords[7].id	https://openalex.org/keywords/randomized-controlled-trial
keywords[7].score	0.45829707384109497
keywords[7].display_name	Randomized controlled trial
keywords[8].id	https://openalex.org/keywords/value
keywords[8].score	0.4566165804862976
keywords[8].display_name	Value (mathematics)
keywords[9].id	https://openalex.org/keywords/sample
keywords[9].score	0.4205012917518616
keywords[9].display_name	Sample (material)
keywords[10].id	https://openalex.org/keywords/mathematical-optimization
keywords[10].score	0.3413822054862976
keywords[10].display_name	Mathematical optimization
keywords[11].id	https://openalex.org/keywords/mathematics
keywords[11].score	0.22781258821487427
keywords[11].display_name	Mathematics
keywords[12].id	https://openalex.org/keywords/statistics
keywords[12].score	0.1854797601699829
keywords[12].display_name	Statistics
keywords[13].id	https://openalex.org/keywords/algorithm
keywords[13].score	0.1613936722278595
keywords[13].display_name	Algorithm
language	en
locations[0].id	pmh:oai:arXiv.org:1706.04241
locations[0].is_oa	True
locations[0].source.id	https://openalex.org/S4306400194
locations[0].source.issn
locations[0].source.type	repository
locations[0].source.is_oa	True
locations[0].source.issn_l
locations[0].source.is_core	False
locations[0].source.is_in_doaj	False
locations[0].source.display_name	arXiv (Cornell University)
locations[0].source.host_organization	https://openalex.org/I205783295
locations[0].source.host_organization_name	Cornell University
locations[0].source.host_organization_lineage	https://openalex.org/I205783295
locations[0].license
locations[0].pdf_url	https://arxiv.org/pdf/1706.04241
locations[0].version	submittedVersion
locations[0].raw_type
locations[0].license_id
locations[0].is_accepted	False
locations[0].is_published	False
locations[0].raw_source_name
locations[0].landing_page_url	http://arxiv.org/abs/1706.04241
locations[1].id	doi:10.48550/arxiv.1706.04241
locations[1].is_oa	True
locations[1].source.id	https://openalex.org/S4306400194
locations[1].source.issn
locations[1].source.type	repository
locations[1].source.is_oa	True
locations[1].source.issn_l
locations[1].source.is_core	False
locations[1].source.is_in_doaj	False
locations[1].source.display_name	arXiv (Cornell University)
locations[1].source.host_organization	https://openalex.org/I205783295
locations[1].source.host_organization_name	Cornell University
locations[1].source.host_organization_lineage	https://openalex.org/I205783295
locations[1].license
locations[1].pdf_url
locations[1].version
locations[1].raw_type	article
locations[1].license_id
locations[1].is_accepted	False
locations[1].is_published
locations[1].raw_source_name
locations[1].landing_page_url	https://doi.org/10.48550/arxiv.1706.04241
indexed_in	arxiv, datacite
authorships[0].author.id	https://openalex.org/A5015899120
authorships[0].author.orcid
authorships[0].author.display_name	Ian Osband
authorships[0].author_position	first
authorships[0].raw_author_name	Ian Osband
authorships[0].is_corresponding	False
authorships[1].author.id	https://openalex.org/A5045543562
authorships[1].author.orcid	https://orcid.org/0000-0001-8364-3746
authorships[1].author.display_name	Benjamin Van Roy
authorships[1].author_position	last
authorships[1].raw_author_name	Benjamin Van Roy
authorships[1].is_corresponding	False
has_content.pdf	False
has_content.grobid_xml	False
is_paratext	False
open_access.is_oa	True
open_access.oa_url	https://arxiv.org/pdf/1706.04241
open_access.oa_status	green
open_access.any_repository_has_fulltext	False
created_date	2025-10-10T00:00:00
display_name	On Optimistic versus Randomized Exploration in Reinforcement Learning
has_fulltext	False
is_retracted	False
updated_date	2025-11-06T06:51:31.235846
primary_topic.id	https://openalex.org/T11975
primary_topic.field.id	https://openalex.org/fields/17
primary_topic.field.display_name	Computer Science
primary_topic.score	0.9984999895095825
primary_topic.domain.id	https://openalex.org/domains/3
primary_topic.domain.display_name	Physical Sciences
primary_topic.subfield.id	https://openalex.org/subfields/1702
primary_topic.subfield.display_name	Artificial Intelligence
primary_topic.display_name	Evolutionary Algorithms and Applications
related_works	https://openalex.org/W4306904969, https://openalex.org/W2138720691, https://openalex.org/W4362501864, https://openalex.org/W4380318855, https://openalex.org/W2031695474, https://openalex.org/W2024136090, https://openalex.org/W2386410636, https://openalex.org/W3038962357, https://openalex.org/W2025663273, https://openalex.org/W3099153698
cited_by_count	9
counts_by_year[0].year	2021
counts_by_year[0].cited_by_count	1
counts_by_year[1].year	2020
counts_by_year[1].cited_by_count	3
counts_by_year[2].year	2019
counts_by_year[2].cited_by_count	1
counts_by_year[3].year	2018
counts_by_year[3].cited_by_count	3
counts_by_year[4].year	2017
counts_by_year[4].cited_by_count	1
locations_count	2
best_oa_location.id	pmh:oai:arXiv.org:1706.04241
best_oa_location.is_oa	True
best_oa_location.source.id	https://openalex.org/S4306400194
best_oa_location.source.issn
best_oa_location.source.type	repository
best_oa_location.source.is_oa	True
best_oa_location.source.issn_l
best_oa_location.source.is_core	False
best_oa_location.source.is_in_doaj	False
best_oa_location.source.display_name	arXiv (Cornell University)
best_oa_location.source.host_organization	https://openalex.org/I205783295
best_oa_location.source.host_organization_name	Cornell University
best_oa_location.source.host_organization_lineage	https://openalex.org/I205783295
best_oa_location.license
best_oa_location.pdf_url	https://arxiv.org/pdf/1706.04241
best_oa_location.version	submittedVersion
best_oa_location.raw_type
best_oa_location.license_id
best_oa_location.is_accepted	False
best_oa_location.is_published	False
best_oa_location.raw_source_name
best_oa_location.landing_page_url	http://arxiv.org/abs/1706.04241
primary_location.id	pmh:oai:arXiv.org:1706.04241
primary_location.is_oa	True
primary_location.source.id	https://openalex.org/S4306400194
primary_location.source.issn
primary_location.source.type	repository
primary_location.source.is_oa	True
primary_location.source.issn_l
primary_location.source.is_core	False
primary_location.source.is_in_doaj	False
primary_location.source.display_name	arXiv (Cornell University)
primary_location.source.host_organization	https://openalex.org/I205783295
primary_location.source.host_organization_name	Cornell University
primary_location.source.host_organization_lineage	https://openalex.org/I205783295
primary_location.license
primary_location.pdf_url	https://arxiv.org/pdf/1706.04241
primary_location.version	submittedVersion
primary_location.raw_type
primary_location.license_id
primary_location.is_accepted	False
primary_location.is_published	False
primary_location.raw_source_name
primary_location.landing_page_url	http://arxiv.org/abs/1706.04241
publication_date	2017-06-13
publication_year	2017
referenced_works	https://openalex.org/W2949475445, https://openalex.org/W2149721706
referenced_works_count	2
abstract_inverted_index.In	96
abstract_inverted_index.We	0, 83
abstract_inverted_index.an	22
abstract_inverted_index.at	29
abstract_inverted_index.be	100
abstract_inverted_index.in	12, 18, 122
abstract_inverted_index.is	93
abstract_inverted_index.of	5, 131
abstract_inverted_index.on	136
abstract_inverted_index.to	10, 25, 41, 64, 77, 107
abstract_inverted_index.and	7, 33, 56, 144
abstract_inverted_index.are	37, 60
abstract_inverted_index.but	110
abstract_inverted_index.can	75
abstract_inverted_index.far	78
abstract_inverted_index.for	128
abstract_inverted_index.may	140
abstract_inverted_index.the	2, 19, 26, 42, 65, 94, 123, 129, 137
abstract_inverted_index.two	85
abstract_inverted_index.why	91
abstract_inverted_index.been	120
abstract_inverted_index.each	30
abstract_inverted_index.fare	104
abstract_inverted_index.from	50
abstract_inverted_index.have	119
abstract_inverted_index.lead	76
abstract_inverted_index.more	79
abstract_inverted_index.pair	32
abstract_inverted_index.sake	130
abstract_inverted_index.that	36, 59, 72, 89, 103, 111, 118
abstract_inverted_index.this	92
abstract_inverted_index.well	105
abstract_inverted_index.with	39, 62
abstract_inverted_index.Prior	68
abstract_inverted_index.among	51
abstract_inverted_index.apply	21
abstract_inverted_index.boost	24
abstract_inverted_index.case.	95
abstract_inverted_index.hand,	139
abstract_inverted_index.other	138
abstract_inverted_index.there	98
abstract_inverted_index.value	27, 45, 54
abstract_inverted_index.would	112
abstract_inverted_index.enable	141
abstract_inverted_index.greedy	38, 61
abstract_inverted_index.merits	4
abstract_inverted_index.random	66
abstract_inverted_index.sample	49
abstract_inverted_index.select	34, 57
abstract_inverted_index.should	99
abstract_inverted_index.simple	86
abstract_inverted_index.actions	35, 58
abstract_inverted_index.discuss	1
abstract_inverted_index.present	84
abstract_inverted_index.require	113
abstract_inverted_index.respect	40, 63
abstract_inverted_index.sample.	67
abstract_inverted_index.analytic	87
abstract_inverted_index.estimate	28
abstract_inverted_index.examples	88
abstract_inverted_index.proposed	121
abstract_inverted_index.relative	3, 106
abstract_inverted_index.suggests	71
abstract_inverted_index.efficient	81
abstract_inverted_index.elucidate	90
abstract_inverted_index.function.	46
abstract_inverted_index.functions	55
abstract_inverted_index.learning.	14, 82
abstract_inverted_index.plausible	53
abstract_inverted_index.presented	17
abstract_inverted_index.resulting	43
abstract_inverted_index.sacrifice	125
abstract_inverted_index.Optimistic	15, 116
abstract_inverted_index.Randomized	47, 134
abstract_inverted_index.approaches	9, 16, 48, 74, 102, 117
abstract_inverted_index.efficiency	127
abstract_inverted_index.experience	70
abstract_inverted_index.literature	20, 124
abstract_inverted_index.optimistic	6, 23, 44, 101
abstract_inverted_index.principle,	97
abstract_inverted_index.randomized	8, 73, 108
abstract_inverted_index.approaches,	109, 135
abstract_inverted_index.efficiency.	133, 146
abstract_inverted_index.exploration	11
abstract_inverted_index.intractable	114
abstract_inverted_index.statistical	126, 143
abstract_inverted_index.computation.	115
abstract_inverted_index.simultaneous	142
abstract_inverted_index.state-action	31
abstract_inverted_index.computational	69, 132, 145
abstract_inverted_index.reinforcement	13
abstract_inverted_index.statistically	52, 80
cited_by_percentile_year
countries_distinct_count	0
institutions_distinct_count	2
citation_normalized_percentile