Randomized Prior Functions for Deep Reinforcement Learning Article Swipe
YOU?
·
· 2018
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.1806.03335
Dealing with uncertainty is essential for efficient reinforcement learning. There is a growing literature on uncertainty estimation for deep learning from fixed datasets, but many of the most popular approaches are poorly-suited to sequential decision problems. Other methods, such as bootstrap sampling, have no mechanism for uncertainty that does not come from the observed data. We highlight why this can be a crucial shortcoming and propose a simple remedy through addition of a randomized untrainable `prior' network to each ensemble member. We prove that this approach is efficient with linear representations, provide simple illustrations of its efficacy with nonlinear representations and show that this approach scales to large-scale problems far better than previous attempts.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/1806.03335
- https://arxiv.org/pdf/1806.03335
- OA Status
- green
- Cited By
- 105
- References
- 6
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W2807588596
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W2807588596Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.1806.03335Digital Object Identifier
- Title
-
Randomized Prior Functions for Deep Reinforcement LearningWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2018Year of publication
- Publication date
-
2018-06-08Full publication date if available
- Authors
-
Ian Osband, John Aslanides, Albin CassirerList of authors in order
- Landing page
-
https://arxiv.org/abs/1806.03335Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/1806.03335Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/1806.03335Direct OA link when available
- Concepts
-
Reinforcement learning, Artificial intelligence, Computer science, Machine learning, Simple (philosophy), Scale (ratio), Philosophy, Epistemology, Quantum mechanics, PhysicsTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
105Total citation count in OpenAlex
- Citations by year (recent)
-
2025: 6, 2024: 10, 2023: 28, 2022: 15, 2021: 19Per-year citation counts (last 5 years)
- References (count)
-
6Number of works referenced by this work
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W2807588596 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.1806.03335 |
| ids.doi | https://doi.org/10.48550/arxiv.1806.03335 |
| ids.mag | 2807588596 |
| ids.openalex | https://openalex.org/W2807588596 |
| fwci | |
| type | preprint |
| title | Randomized Prior Functions for Deep Reinforcement Learning |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10462 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9998999834060669 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1702 |
| topics[0].subfield.display_name | Artificial Intelligence |
| topics[0].display_name | Reinforcement Learning in Robotics |
| topics[1].id | https://openalex.org/T12101 |
| topics[1].field.id | https://openalex.org/fields/18 |
| topics[1].field.display_name | Decision Sciences |
| topics[1].score | 0.9994000196456909 |
| topics[1].domain.id | https://openalex.org/domains/2 |
| topics[1].domain.display_name | Social Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1803 |
| topics[1].subfield.display_name | Management Science and Operations Research |
| topics[1].display_name | Advanced Bandit Algorithms Research |
| topics[2].id | https://openalex.org/T10848 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.9962999820709229 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1703 |
| topics[2].subfield.display_name | Computational Theory and Mathematics |
| topics[2].display_name | Advanced Multi-Objective Optimization Algorithms |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C97541855 |
| concepts[0].level | 2 |
| concepts[0].score | 0.8068270683288574 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q830687 |
| concepts[0].display_name | Reinforcement learning |
| concepts[1].id | https://openalex.org/C154945302 |
| concepts[1].level | 1 |
| concepts[1].score | 0.6876015067100525 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[1].display_name | Artificial intelligence |
| concepts[2].id | https://openalex.org/C41008148 |
| concepts[2].level | 0 |
| concepts[2].score | 0.6485185027122498 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[2].display_name | Computer science |
| concepts[3].id | https://openalex.org/C119857082 |
| concepts[3].level | 1 |
| concepts[3].score | 0.6134330630302429 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q2539 |
| concepts[3].display_name | Machine learning |
| concepts[4].id | https://openalex.org/C2780586882 |
| concepts[4].level | 2 |
| concepts[4].score | 0.5862534046173096 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q7520643 |
| concepts[4].display_name | Simple (philosophy) |
| concepts[5].id | https://openalex.org/C2778755073 |
| concepts[5].level | 2 |
| concepts[5].score | 0.46834418177604675 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q10858537 |
| concepts[5].display_name | Scale (ratio) |
| concepts[6].id | https://openalex.org/C138885662 |
| concepts[6].level | 0 |
| concepts[6].score | 0.0 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q5891 |
| concepts[6].display_name | Philosophy |
| concepts[7].id | https://openalex.org/C111472728 |
| concepts[7].level | 1 |
| concepts[7].score | 0.0 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q9471 |
| concepts[7].display_name | Epistemology |
| concepts[8].id | https://openalex.org/C62520636 |
| concepts[8].level | 1 |
| concepts[8].score | 0.0 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q944 |
| concepts[8].display_name | Quantum mechanics |
| concepts[9].id | https://openalex.org/C121332964 |
| concepts[9].level | 0 |
| concepts[9].score | 0.0 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q413 |
| concepts[9].display_name | Physics |
| keywords[0].id | https://openalex.org/keywords/reinforcement-learning |
| keywords[0].score | 0.8068270683288574 |
| keywords[0].display_name | Reinforcement learning |
| keywords[1].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[1].score | 0.6876015067100525 |
| keywords[1].display_name | Artificial intelligence |
| keywords[2].id | https://openalex.org/keywords/computer-science |
| keywords[2].score | 0.6485185027122498 |
| keywords[2].display_name | Computer science |
| keywords[3].id | https://openalex.org/keywords/machine-learning |
| keywords[3].score | 0.6134330630302429 |
| keywords[3].display_name | Machine learning |
| keywords[4].id | https://openalex.org/keywords/simple |
| keywords[4].score | 0.5862534046173096 |
| keywords[4].display_name | Simple (philosophy) |
| keywords[5].id | https://openalex.org/keywords/scale |
| keywords[5].score | 0.46834418177604675 |
| keywords[5].display_name | Scale (ratio) |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:1806.03335 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/1806.03335 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/1806.03335 |
| locations[1].id | doi:10.48550/arxiv.1806.03335 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.1806.03335 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5015899120 |
| authorships[0].author.orcid | |
| authorships[0].author.display_name | Ian Osband |
| authorships[0].countries | GB |
| authorships[0].affiliations[0].institution_ids | https://openalex.org/I4210090411 |
| authorships[0].affiliations[0].raw_affiliation_string | DeepMind#TAB# |
| authorships[0].institutions[0].id | https://openalex.org/I4210090411 |
| authorships[0].institutions[0].ror | https://ror.org/00971b260 |
| authorships[0].institutions[0].type | company |
| authorships[0].institutions[0].lineage | https://openalex.org/I4210090411, https://openalex.org/I4210128969 |
| authorships[0].institutions[0].country_code | GB |
| authorships[0].institutions[0].display_name | DeepMind (United Kingdom) |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Ian Osband |
| authorships[0].is_corresponding | False |
| authorships[0].raw_affiliation_strings | DeepMind#TAB# |
| authorships[1].author.id | https://openalex.org/A5029807267 |
| authorships[1].author.orcid | |
| authorships[1].author.display_name | John Aslanides |
| authorships[1].countries | GB |
| authorships[1].affiliations[0].institution_ids | https://openalex.org/I4210090411 |
| authorships[1].affiliations[0].raw_affiliation_string | DeepMind#TAB# |
| authorships[1].institutions[0].id | https://openalex.org/I4210090411 |
| authorships[1].institutions[0].ror | https://ror.org/00971b260 |
| authorships[1].institutions[0].type | company |
| authorships[1].institutions[0].lineage | https://openalex.org/I4210090411, https://openalex.org/I4210128969 |
| authorships[1].institutions[0].country_code | GB |
| authorships[1].institutions[0].display_name | DeepMind (United Kingdom) |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | John Aslanides |
| authorships[1].is_corresponding | False |
| authorships[1].raw_affiliation_strings | DeepMind#TAB# |
| authorships[2].author.id | https://openalex.org/A5030260581 |
| authorships[2].author.orcid | |
| authorships[2].author.display_name | Albin Cassirer |
| authorships[2].countries | GB |
| authorships[2].affiliations[0].institution_ids | https://openalex.org/I4210090411 |
| authorships[2].affiliations[0].raw_affiliation_string | DeepMind#TAB# |
| authorships[2].institutions[0].id | https://openalex.org/I4210090411 |
| authorships[2].institutions[0].ror | https://ror.org/00971b260 |
| authorships[2].institutions[0].type | company |
| authorships[2].institutions[0].lineage | https://openalex.org/I4210090411, https://openalex.org/I4210128969 |
| authorships[2].institutions[0].country_code | GB |
| authorships[2].institutions[0].display_name | DeepMind (United Kingdom) |
| authorships[2].author_position | last |
| authorships[2].raw_author_name | Albin Cassirer |
| authorships[2].is_corresponding | False |
| authorships[2].raw_affiliation_strings | DeepMind#TAB# |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/1806.03335 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Randomized Prior Functions for Deep Reinforcement Learning |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10462 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9998999834060669 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1702 |
| primary_topic.subfield.display_name | Artificial Intelligence |
| primary_topic.display_name | Reinforcement Learning in Robotics |
| related_works | https://openalex.org/W2961085424, https://openalex.org/W4306674287, https://openalex.org/W3046775127, https://openalex.org/W3107602296, https://openalex.org/W3170094116, https://openalex.org/W4386462264, https://openalex.org/W4364306694, https://openalex.org/W4312192474, https://openalex.org/W4283697347, https://openalex.org/W4210805261 |
| cited_by_count | 105 |
| counts_by_year[0].year | 2025 |
| counts_by_year[0].cited_by_count | 6 |
| counts_by_year[1].year | 2024 |
| counts_by_year[1].cited_by_count | 10 |
| counts_by_year[2].year | 2023 |
| counts_by_year[2].cited_by_count | 28 |
| counts_by_year[3].year | 2022 |
| counts_by_year[3].cited_by_count | 15 |
| counts_by_year[4].year | 2021 |
| counts_by_year[4].cited_by_count | 19 |
| counts_by_year[5].year | 2020 |
| counts_by_year[5].cited_by_count | 14 |
| counts_by_year[6].year | 2019 |
| counts_by_year[6].cited_by_count | 8 |
| counts_by_year[7].year | 2018 |
| counts_by_year[7].cited_by_count | 5 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:1806.03335 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/1806.03335 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/1806.03335 |
| primary_location.id | pmh:oai:arXiv.org:1806.03335 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/1806.03335 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/1806.03335 |
| publication_date | 2018-06-08 |
| publication_year | 2018 |
| referenced_works | https://openalex.org/W2124181495, https://openalex.org/W2507592741, https://openalex.org/W2121863487, https://openalex.org/W2786928559, https://openalex.org/W2470693974, https://openalex.org/W1975950946 |
| referenced_works_count | 6 |
| abstract_inverted_index.a | 11, 61, 66, 72 |
| abstract_inverted_index.We | 55, 81 |
| abstract_inverted_index.as | 39 |
| abstract_inverted_index.be | 60 |
| abstract_inverted_index.is | 3, 10, 86 |
| abstract_inverted_index.no | 43 |
| abstract_inverted_index.of | 25, 71, 94 |
| abstract_inverted_index.on | 14 |
| abstract_inverted_index.to | 32, 77, 106 |
| abstract_inverted_index.and | 64, 100 |
| abstract_inverted_index.are | 30 |
| abstract_inverted_index.but | 23 |
| abstract_inverted_index.can | 59 |
| abstract_inverted_index.far | 109 |
| abstract_inverted_index.for | 5, 17, 45 |
| abstract_inverted_index.its | 95 |
| abstract_inverted_index.not | 49 |
| abstract_inverted_index.the | 26, 52 |
| abstract_inverted_index.why | 57 |
| abstract_inverted_index.come | 50 |
| abstract_inverted_index.deep | 18 |
| abstract_inverted_index.does | 48 |
| abstract_inverted_index.each | 78 |
| abstract_inverted_index.from | 20, 51 |
| abstract_inverted_index.have | 42 |
| abstract_inverted_index.many | 24 |
| abstract_inverted_index.most | 27 |
| abstract_inverted_index.show | 101 |
| abstract_inverted_index.such | 38 |
| abstract_inverted_index.than | 111 |
| abstract_inverted_index.that | 47, 83, 102 |
| abstract_inverted_index.this | 58, 84, 103 |
| abstract_inverted_index.with | 1, 88, 97 |
| abstract_inverted_index.Other | 36 |
| abstract_inverted_index.There | 9 |
| abstract_inverted_index.data. | 54 |
| abstract_inverted_index.fixed | 21 |
| abstract_inverted_index.prove | 82 |
| abstract_inverted_index.better | 110 |
| abstract_inverted_index.linear | 89 |
| abstract_inverted_index.remedy | 68 |
| abstract_inverted_index.scales | 105 |
| abstract_inverted_index.simple | 67, 92 |
| abstract_inverted_index.Dealing | 0 |
| abstract_inverted_index.`prior' | 75 |
| abstract_inverted_index.crucial | 62 |
| abstract_inverted_index.growing | 12 |
| abstract_inverted_index.member. | 80 |
| abstract_inverted_index.network | 76 |
| abstract_inverted_index.popular | 28 |
| abstract_inverted_index.propose | 65 |
| abstract_inverted_index.provide | 91 |
| abstract_inverted_index.through | 69 |
| abstract_inverted_index.addition | 70 |
| abstract_inverted_index.approach | 85, 104 |
| abstract_inverted_index.decision | 34 |
| abstract_inverted_index.efficacy | 96 |
| abstract_inverted_index.ensemble | 79 |
| abstract_inverted_index.learning | 19 |
| abstract_inverted_index.methods, | 37 |
| abstract_inverted_index.observed | 53 |
| abstract_inverted_index.previous | 112 |
| abstract_inverted_index.problems | 108 |
| abstract_inverted_index.attempts. | 113 |
| abstract_inverted_index.bootstrap | 40 |
| abstract_inverted_index.datasets, | 22 |
| abstract_inverted_index.efficient | 6, 87 |
| abstract_inverted_index.essential | 4 |
| abstract_inverted_index.highlight | 56 |
| abstract_inverted_index.learning. | 8 |
| abstract_inverted_index.mechanism | 44 |
| abstract_inverted_index.nonlinear | 98 |
| abstract_inverted_index.problems. | 35 |
| abstract_inverted_index.sampling, | 41 |
| abstract_inverted_index.approaches | 29 |
| abstract_inverted_index.estimation | 16 |
| abstract_inverted_index.literature | 13 |
| abstract_inverted_index.randomized | 73 |
| abstract_inverted_index.sequential | 33 |
| abstract_inverted_index.large-scale | 107 |
| abstract_inverted_index.shortcoming | 63 |
| abstract_inverted_index.uncertainty | 2, 15, 46 |
| abstract_inverted_index.untrainable | 74 |
| abstract_inverted_index.illustrations | 93 |
| abstract_inverted_index.poorly-suited | 31 |
| abstract_inverted_index.reinforcement | 7 |
| abstract_inverted_index.representations | 99 |
| abstract_inverted_index.representations, | 90 |
| cited_by_percentile_year | |
| countries_distinct_count | 1 |
| institutions_distinct_count | 3 |
| sustainable_development_goals[0].id | https://metadata.un.org/sdg/16 |
| sustainable_development_goals[0].score | 0.7900000214576721 |
| sustainable_development_goals[0].display_name | Peace, Justice and strong institutions |
| citation_normalized_percentile |