Scalable Coordinated Exploration in Concurrent Reinforcement Learning Article Swipe
Maria Dimakopoulou
,
Ian Osband
,
Benjamin Van Roy
·
YOU?
·
· 2018
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.1805.08948
YOU?
·
· 2018
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.1805.08948
We consider a team of reinforcement learning agents that concurrently operate in a common environment, and we develop an approach to efficient coordinated exploration that is suitable for problems of practical scale. Our approach builds on seed sampling (Dimakopoulou and Van Roy, 2018) and randomized value function learning (Osband et al., 2016). We demonstrate that, for simple tabular contexts, the approach is competitive with previously proposed tabular model learning methods (Dimakopoulou and Van Roy, 2018). With a higher-dimensional problem and a neural network value function representation, the approach learns quickly with far fewer agents than alternative exploration schemes.
Related Topics
Concepts
Metadata
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/1805.08948
- https://arxiv.org/pdf/1805.08948
- OA Status
- green
- Cited By
- 8
- References
- 18
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W2803728337
All OpenAlex metadata
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W2803728337Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.1805.08948Digital Object Identifier
- Title
-
Scalable Coordinated Exploration in Concurrent Reinforcement LearningWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2018Year of publication
- Publication date
-
2018-05-23Full publication date if available
- Authors
-
Maria Dimakopoulou, Ian Osband, Benjamin Van RoyList of authors in order
- Landing page
-
https://arxiv.org/abs/1805.08948Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/1805.08948Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/1805.08948Direct OA link when available
- Concepts
-
Reinforcement learning, Scalability, Computer science, Reinforcement, Distributed computing, Computer architecture, Artificial intelligence, Psychology, Operating system, Social psychologyTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
8Total citation count in OpenAlex
- Citations by year (recent)
-
2023: 3, 2022: 1, 2021: 1, 2020: 2, 2019: 1Per-year citation counts (last 5 years)
- References (count)
-
18Number of works referenced by this work
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W2803728337 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.1805.08948 |
| ids.doi | https://doi.org/10.48550/arxiv.1805.08948 |
| ids.mag | 2803728337 |
| ids.openalex | https://openalex.org/W2803728337 |
| fwci | |
| type | preprint |
| title | Scalable Coordinated Exploration in Concurrent Reinforcement Learning |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10462 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9998999834060669 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1702 |
| topics[0].subfield.display_name | Artificial Intelligence |
| topics[0].display_name | Reinforcement Learning in Robotics |
| topics[1].id | https://openalex.org/T12101 |
| topics[1].field.id | https://openalex.org/fields/18 |
| topics[1].field.display_name | Decision Sciences |
| topics[1].score | 0.9977999925613403 |
| topics[1].domain.id | https://openalex.org/domains/2 |
| topics[1].domain.display_name | Social Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1803 |
| topics[1].subfield.display_name | Management Science and Operations Research |
| topics[1].display_name | Advanced Bandit Algorithms Research |
| topics[2].id | https://openalex.org/T10586 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.9955999851226807 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1707 |
| topics[2].subfield.display_name | Computer Vision and Pattern Recognition |
| topics[2].display_name | Robotic Path Planning Algorithms |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C97541855 |
| concepts[0].level | 2 |
| concepts[0].score | 0.8262474536895752 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q830687 |
| concepts[0].display_name | Reinforcement learning |
| concepts[1].id | https://openalex.org/C48044578 |
| concepts[1].level | 2 |
| concepts[1].score | 0.727385401725769 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q727490 |
| concepts[1].display_name | Scalability |
| concepts[2].id | https://openalex.org/C41008148 |
| concepts[2].level | 0 |
| concepts[2].score | 0.6446765661239624 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[2].display_name | Computer science |
| concepts[3].id | https://openalex.org/C67203356 |
| concepts[3].level | 2 |
| concepts[3].score | 0.5567582845687866 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q1321905 |
| concepts[3].display_name | Reinforcement |
| concepts[4].id | https://openalex.org/C120314980 |
| concepts[4].level | 1 |
| concepts[4].score | 0.39724791049957275 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q180634 |
| concepts[4].display_name | Distributed computing |
| concepts[5].id | https://openalex.org/C118524514 |
| concepts[5].level | 1 |
| concepts[5].score | 0.3332359790802002 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q173212 |
| concepts[5].display_name | Computer architecture |
| concepts[6].id | https://openalex.org/C154945302 |
| concepts[6].level | 1 |
| concepts[6].score | 0.2615622282028198 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[6].display_name | Artificial intelligence |
| concepts[7].id | https://openalex.org/C15744967 |
| concepts[7].level | 0 |
| concepts[7].score | 0.16070079803466797 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q9418 |
| concepts[7].display_name | Psychology |
| concepts[8].id | https://openalex.org/C111919701 |
| concepts[8].level | 1 |
| concepts[8].score | 0.0721149742603302 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q9135 |
| concepts[8].display_name | Operating system |
| concepts[9].id | https://openalex.org/C77805123 |
| concepts[9].level | 1 |
| concepts[9].score | 0.05688813328742981 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q161272 |
| concepts[9].display_name | Social psychology |
| keywords[0].id | https://openalex.org/keywords/reinforcement-learning |
| keywords[0].score | 0.8262474536895752 |
| keywords[0].display_name | Reinforcement learning |
| keywords[1].id | https://openalex.org/keywords/scalability |
| keywords[1].score | 0.727385401725769 |
| keywords[1].display_name | Scalability |
| keywords[2].id | https://openalex.org/keywords/computer-science |
| keywords[2].score | 0.6446765661239624 |
| keywords[2].display_name | Computer science |
| keywords[3].id | https://openalex.org/keywords/reinforcement |
| keywords[3].score | 0.5567582845687866 |
| keywords[3].display_name | Reinforcement |
| keywords[4].id | https://openalex.org/keywords/distributed-computing |
| keywords[4].score | 0.39724791049957275 |
| keywords[4].display_name | Distributed computing |
| keywords[5].id | https://openalex.org/keywords/computer-architecture |
| keywords[5].score | 0.3332359790802002 |
| keywords[5].display_name | Computer architecture |
| keywords[6].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[6].score | 0.2615622282028198 |
| keywords[6].display_name | Artificial intelligence |
| keywords[7].id | https://openalex.org/keywords/psychology |
| keywords[7].score | 0.16070079803466797 |
| keywords[7].display_name | Psychology |
| keywords[8].id | https://openalex.org/keywords/operating-system |
| keywords[8].score | 0.0721149742603302 |
| keywords[8].display_name | Operating system |
| keywords[9].id | https://openalex.org/keywords/social-psychology |
| keywords[9].score | 0.05688813328742981 |
| keywords[9].display_name | Social psychology |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:1805.08948 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/1805.08948 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/1805.08948 |
| locations[1].id | doi:10.48550/arxiv.1805.08948 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.1805.08948 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5068328730 |
| authorships[0].author.orcid | |
| authorships[0].author.display_name | Maria Dimakopoulou |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Maria Dimakopoulou |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5015899120 |
| authorships[1].author.orcid | |
| authorships[1].author.display_name | Ian Osband |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Ian Osband |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5045543562 |
| authorships[2].author.orcid | https://orcid.org/0000-0001-8364-3746 |
| authorships[2].author.display_name | Benjamin Van Roy |
| authorships[2].author_position | last |
| authorships[2].raw_author_name | Benjamin Van Roy |
| authorships[2].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/1805.08948 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Scalable Coordinated Exploration in Concurrent Reinforcement Learning |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10462 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9998999834060669 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1702 |
| primary_topic.subfield.display_name | Artificial Intelligence |
| primary_topic.display_name | Reinforcement Learning in Robotics |
| related_works | https://openalex.org/W2920061524, https://openalex.org/W4310083477, https://openalex.org/W2328553770, https://openalex.org/W1982914007, https://openalex.org/W2159583675, https://openalex.org/W1824242903, https://openalex.org/W1493858311, https://openalex.org/W2155470929, https://openalex.org/W2394465510, https://openalex.org/W2111125783 |
| cited_by_count | 8 |
| counts_by_year[0].year | 2023 |
| counts_by_year[0].cited_by_count | 3 |
| counts_by_year[1].year | 2022 |
| counts_by_year[1].cited_by_count | 1 |
| counts_by_year[2].year | 2021 |
| counts_by_year[2].cited_by_count | 1 |
| counts_by_year[3].year | 2020 |
| counts_by_year[3].cited_by_count | 2 |
| counts_by_year[4].year | 2019 |
| counts_by_year[4].cited_by_count | 1 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:1805.08948 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/1805.08948 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/1805.08948 |
| primary_location.id | pmh:oai:arXiv.org:1805.08948 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/1805.08948 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/1805.08948 |
| publication_date | 2018-05-23 |
| publication_year | 2018 |
| referenced_works | https://openalex.org/W2489939061, https://openalex.org/W2731829640, https://openalex.org/W2073384958, https://openalex.org/W2962910611, https://openalex.org/W2567415945, https://openalex.org/W2142620093, https://openalex.org/W2963158178, https://openalex.org/W2625705959, https://openalex.org/W2781585732, https://openalex.org/W1850488217, https://openalex.org/W2214971211, https://openalex.org/W1582436621, https://openalex.org/W2962767126, https://openalex.org/W21891419, https://openalex.org/W2121863487, https://openalex.org/W1522301498, https://openalex.org/W2807588596, https://openalex.org/W2513739600 |
| referenced_works_count | 18 |
| abstract_inverted_index.a | 2, 12, 76, 80 |
| abstract_inverted_index.We | 0, 52 |
| abstract_inverted_index.an | 18 |
| abstract_inverted_index.et | 49 |
| abstract_inverted_index.in | 11 |
| abstract_inverted_index.is | 25, 61 |
| abstract_inverted_index.of | 4, 29 |
| abstract_inverted_index.on | 35 |
| abstract_inverted_index.to | 20 |
| abstract_inverted_index.we | 16 |
| abstract_inverted_index.Our | 32 |
| abstract_inverted_index.Van | 40, 72 |
| abstract_inverted_index.and | 15, 39, 43, 71, 79 |
| abstract_inverted_index.far | 91 |
| abstract_inverted_index.for | 27, 55 |
| abstract_inverted_index.the | 59, 86 |
| abstract_inverted_index.Roy, | 41, 73 |
| abstract_inverted_index.With | 75 |
| abstract_inverted_index.al., | 50 |
| abstract_inverted_index.seed | 36 |
| abstract_inverted_index.team | 3 |
| abstract_inverted_index.than | 94 |
| abstract_inverted_index.that | 8, 24 |
| abstract_inverted_index.with | 63, 90 |
| abstract_inverted_index.2018) | 42 |
| abstract_inverted_index.fewer | 92 |
| abstract_inverted_index.model | 67 |
| abstract_inverted_index.that, | 54 |
| abstract_inverted_index.value | 45, 83 |
| abstract_inverted_index.2016). | 51 |
| abstract_inverted_index.2018). | 74 |
| abstract_inverted_index.agents | 7, 93 |
| abstract_inverted_index.builds | 34 |
| abstract_inverted_index.common | 13 |
| abstract_inverted_index.learns | 88 |
| abstract_inverted_index.neural | 81 |
| abstract_inverted_index.scale. | 31 |
| abstract_inverted_index.simple | 56 |
| abstract_inverted_index.(Osband | 48 |
| abstract_inverted_index.develop | 17 |
| abstract_inverted_index.methods | 69 |
| abstract_inverted_index.network | 82 |
| abstract_inverted_index.operate | 10 |
| abstract_inverted_index.problem | 78 |
| abstract_inverted_index.quickly | 89 |
| abstract_inverted_index.tabular | 57, 66 |
| abstract_inverted_index.approach | 19, 33, 60, 87 |
| abstract_inverted_index.consider | 1 |
| abstract_inverted_index.function | 46, 84 |
| abstract_inverted_index.learning | 6, 47, 68 |
| abstract_inverted_index.problems | 28 |
| abstract_inverted_index.proposed | 65 |
| abstract_inverted_index.sampling | 37 |
| abstract_inverted_index.schemes. | 97 |
| abstract_inverted_index.suitable | 26 |
| abstract_inverted_index.contexts, | 58 |
| abstract_inverted_index.efficient | 21 |
| abstract_inverted_index.practical | 30 |
| abstract_inverted_index.previously | 64 |
| abstract_inverted_index.randomized | 44 |
| abstract_inverted_index.alternative | 95 |
| abstract_inverted_index.competitive | 62 |
| abstract_inverted_index.coordinated | 22 |
| abstract_inverted_index.demonstrate | 53 |
| abstract_inverted_index.exploration | 23, 96 |
| abstract_inverted_index.concurrently | 9 |
| abstract_inverted_index.environment, | 14 |
| abstract_inverted_index.(Dimakopoulou | 38, 70 |
| abstract_inverted_index.reinforcement | 5 |
| abstract_inverted_index.representation, | 85 |
| abstract_inverted_index.higher-dimensional | 77 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 3 |
| citation_normalized_percentile |