Making Sense of Reinforcement Learning and Probabilistic Inference Article Swipe
YOU?
·
· 2020
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2001.00805
Reinforcement learning (RL) combines a control problem with statistical estimation: The system dynamics are not known to the agent, but can be learned through experience. A recent line of research casts `RL as inference' and suggests a particular framework to generalize the RL problem as probabilistic inference. Our paper surfaces a key shortcoming in that approach, and clarifies the sense in which RL can be coherently cast as an inference problem. In particular, an RL agent must consider the effects of its actions upon future rewards and observations: The exploration-exploitation tradeoff. In all but the most simple settings, the resulting inference is computationally intractable so that practical RL algorithms must resort to approximation. We demonstrate that the popular `RL as inference' approximation can perform poorly in even very basic problems. However, we show that with a small modification the framework does yield algorithms that can provably perform well, and we show that the resulting algorithm is equivalent to the recently proposed K-learning, which we further connect with Thompson sampling.
Related Topics
- Type
- article
- Language
- en
- Landing Page
- http://arxiv.org/abs/2001.00805
- https://arxiv.org/pdf/2001.00805
- OA Status
- green
- Cited By
- 12
- References
- 46
- Related Works
- 20
- OpenAlex ID
- https://openalex.org/W2996251520
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W2996251520Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2001.00805Digital Object Identifier
- Title
-
Making Sense of Reinforcement Learning and Probabilistic InferenceWork title
- Type
-
articleOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2020Year of publication
- Publication date
-
2020-01-03Full publication date if available
- Authors
-
Brendan O’Donoghue, Ian Osband, Catalin IonescuList of authors in order
- Landing page
-
https://arxiv.org/abs/2001.00805Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2001.00805Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2001.00805Direct OA link when available
- Concepts
-
Reinforcement learning, Inference, Computer science, Probabilistic logic, Artificial intelligence, Machine learning, Simple (philosophy), Key (lock), Thompson sampling, Approximate inference, Statistical inference, Theoretical computer science, Bayesian probability, Mathematics, Computer security, Philosophy, Statistics, EpistemologyTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
12Total citation count in OpenAlex
- Citations by year (recent)
-
2023: 1, 2022: 2, 2021: 3, 2020: 5, 2018: 1Per-year citation counts (last 5 years)
- References (count)
-
46Number of works referenced by this work
- Related works (count)
-
20Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W2996251520 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2001.00805 |
| ids.doi | https://doi.org/10.48550/arxiv.2001.00805 |
| ids.mag | 2996251520 |
| ids.openalex | https://openalex.org/W2996251520 |
| fwci | |
| type | article |
| title | Making Sense of Reinforcement Learning and Probabilistic Inference |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10462 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9983999729156494 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1702 |
| topics[0].subfield.display_name | Artificial Intelligence |
| topics[0].display_name | Reinforcement Learning in Robotics |
| topics[1].id | https://openalex.org/T11975 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9922999739646912 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1702 |
| topics[1].subfield.display_name | Artificial Intelligence |
| topics[1].display_name | Evolutionary Algorithms and Applications |
| topics[2].id | https://openalex.org/T12101 |
| topics[2].field.id | https://openalex.org/fields/18 |
| topics[2].field.display_name | Decision Sciences |
| topics[2].score | 0.9886000156402588 |
| topics[2].domain.id | https://openalex.org/domains/2 |
| topics[2].domain.display_name | Social Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1803 |
| topics[2].subfield.display_name | Management Science and Operations Research |
| topics[2].display_name | Advanced Bandit Algorithms Research |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C97541855 |
| concepts[0].level | 2 |
| concepts[0].score | 0.8535065650939941 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q830687 |
| concepts[0].display_name | Reinforcement learning |
| concepts[1].id | https://openalex.org/C2776214188 |
| concepts[1].level | 2 |
| concepts[1].score | 0.8119436502456665 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q408386 |
| concepts[1].display_name | Inference |
| concepts[2].id | https://openalex.org/C41008148 |
| concepts[2].level | 0 |
| concepts[2].score | 0.7029316425323486 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[2].display_name | Computer science |
| concepts[3].id | https://openalex.org/C49937458 |
| concepts[3].level | 2 |
| concepts[3].score | 0.6054031252861023 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q2599292 |
| concepts[3].display_name | Probabilistic logic |
| concepts[4].id | https://openalex.org/C154945302 |
| concepts[4].level | 1 |
| concepts[4].score | 0.5981602072715759 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[4].display_name | Artificial intelligence |
| concepts[5].id | https://openalex.org/C119857082 |
| concepts[5].level | 1 |
| concepts[5].score | 0.5168355107307434 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q2539 |
| concepts[5].display_name | Machine learning |
| concepts[6].id | https://openalex.org/C2780586882 |
| concepts[6].level | 2 |
| concepts[6].score | 0.4742618799209595 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q7520643 |
| concepts[6].display_name | Simple (philosophy) |
| concepts[7].id | https://openalex.org/C26517878 |
| concepts[7].level | 2 |
| concepts[7].score | 0.42917773127555847 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q228039 |
| concepts[7].display_name | Key (lock) |
| concepts[8].id | https://openalex.org/C73602740 |
| concepts[8].level | 3 |
| concepts[8].score | 0.42317166924476624 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q7795822 |
| concepts[8].display_name | Thompson sampling |
| concepts[9].id | https://openalex.org/C2777472644 |
| concepts[9].level | 3 |
| concepts[9].score | 0.4227999448776245 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q16968992 |
| concepts[9].display_name | Approximate inference |
| concepts[10].id | https://openalex.org/C134261354 |
| concepts[10].level | 2 |
| concepts[10].score | 0.41461944580078125 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q938438 |
| concepts[10].display_name | Statistical inference |
| concepts[11].id | https://openalex.org/C80444323 |
| concepts[11].level | 1 |
| concepts[11].score | 0.3350437879562378 |
| concepts[11].wikidata | https://www.wikidata.org/wiki/Q2878974 |
| concepts[11].display_name | Theoretical computer science |
| concepts[12].id | https://openalex.org/C107673813 |
| concepts[12].level | 2 |
| concepts[12].score | 0.18123915791511536 |
| concepts[12].wikidata | https://www.wikidata.org/wiki/Q812534 |
| concepts[12].display_name | Bayesian probability |
| concepts[13].id | https://openalex.org/C33923547 |
| concepts[13].level | 0 |
| concepts[13].score | 0.17057141661643982 |
| concepts[13].wikidata | https://www.wikidata.org/wiki/Q395 |
| concepts[13].display_name | Mathematics |
| concepts[14].id | https://openalex.org/C38652104 |
| concepts[14].level | 1 |
| concepts[14].score | 0.0 |
| concepts[14].wikidata | https://www.wikidata.org/wiki/Q3510521 |
| concepts[14].display_name | Computer security |
| concepts[15].id | https://openalex.org/C138885662 |
| concepts[15].level | 0 |
| concepts[15].score | 0.0 |
| concepts[15].wikidata | https://www.wikidata.org/wiki/Q5891 |
| concepts[15].display_name | Philosophy |
| concepts[16].id | https://openalex.org/C105795698 |
| concepts[16].level | 1 |
| concepts[16].score | 0.0 |
| concepts[16].wikidata | https://www.wikidata.org/wiki/Q12483 |
| concepts[16].display_name | Statistics |
| concepts[17].id | https://openalex.org/C111472728 |
| concepts[17].level | 1 |
| concepts[17].score | 0.0 |
| concepts[17].wikidata | https://www.wikidata.org/wiki/Q9471 |
| concepts[17].display_name | Epistemology |
| keywords[0].id | https://openalex.org/keywords/reinforcement-learning |
| keywords[0].score | 0.8535065650939941 |
| keywords[0].display_name | Reinforcement learning |
| keywords[1].id | https://openalex.org/keywords/inference |
| keywords[1].score | 0.8119436502456665 |
| keywords[1].display_name | Inference |
| keywords[2].id | https://openalex.org/keywords/computer-science |
| keywords[2].score | 0.7029316425323486 |
| keywords[2].display_name | Computer science |
| keywords[3].id | https://openalex.org/keywords/probabilistic-logic |
| keywords[3].score | 0.6054031252861023 |
| keywords[3].display_name | Probabilistic logic |
| keywords[4].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[4].score | 0.5981602072715759 |
| keywords[4].display_name | Artificial intelligence |
| keywords[5].id | https://openalex.org/keywords/machine-learning |
| keywords[5].score | 0.5168355107307434 |
| keywords[5].display_name | Machine learning |
| keywords[6].id | https://openalex.org/keywords/simple |
| keywords[6].score | 0.4742618799209595 |
| keywords[6].display_name | Simple (philosophy) |
| keywords[7].id | https://openalex.org/keywords/key |
| keywords[7].score | 0.42917773127555847 |
| keywords[7].display_name | Key (lock) |
| keywords[8].id | https://openalex.org/keywords/thompson-sampling |
| keywords[8].score | 0.42317166924476624 |
| keywords[8].display_name | Thompson sampling |
| keywords[9].id | https://openalex.org/keywords/approximate-inference |
| keywords[9].score | 0.4227999448776245 |
| keywords[9].display_name | Approximate inference |
| keywords[10].id | https://openalex.org/keywords/statistical-inference |
| keywords[10].score | 0.41461944580078125 |
| keywords[10].display_name | Statistical inference |
| keywords[11].id | https://openalex.org/keywords/theoretical-computer-science |
| keywords[11].score | 0.3350437879562378 |
| keywords[11].display_name | Theoretical computer science |
| keywords[12].id | https://openalex.org/keywords/bayesian-probability |
| keywords[12].score | 0.18123915791511536 |
| keywords[12].display_name | Bayesian probability |
| keywords[13].id | https://openalex.org/keywords/mathematics |
| keywords[13].score | 0.17057141661643982 |
| keywords[13].display_name | Mathematics |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2001.00805 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2001.00805 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2001.00805 |
| locations[1].id | mag:2998430347 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | submittedVersion |
| locations[1].raw_type | |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | False |
| locations[1].raw_source_name | arXiv (Cornell University) |
| locations[1].landing_page_url | https://arxiv.org/pdf/2001.00805.pdf |
| locations[2].id | doi:10.48550/arxiv.2001.00805 |
| locations[2].is_oa | True |
| locations[2].source.id | https://openalex.org/S4306400194 |
| locations[2].source.issn | |
| locations[2].source.type | repository |
| locations[2].source.is_oa | True |
| locations[2].source.issn_l | |
| locations[2].source.is_core | False |
| locations[2].source.is_in_doaj | False |
| locations[2].source.display_name | arXiv (Cornell University) |
| locations[2].source.host_organization | https://openalex.org/I205783295 |
| locations[2].source.host_organization_name | Cornell University |
| locations[2].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[2].license | |
| locations[2].pdf_url | |
| locations[2].version | |
| locations[2].raw_type | article |
| locations[2].license_id | |
| locations[2].is_accepted | False |
| locations[2].is_published | |
| locations[2].raw_source_name | |
| locations[2].landing_page_url | https://doi.org/10.48550/arxiv.2001.00805 |
| locations[3].id | mag:2996251520 |
| locations[3].is_oa | False |
| locations[3].source.id | https://openalex.org/S4306419637 |
| locations[3].source.issn | |
| locations[3].source.type | conference |
| locations[3].source.is_oa | False |
| locations[3].source.issn_l | |
| locations[3].source.is_core | False |
| locations[3].source.is_in_doaj | False |
| locations[3].source.display_name | International Conference on Learning Representations |
| locations[3].source.host_organization | |
| locations[3].source.host_organization_name | |
| locations[3].license | |
| locations[3].pdf_url | |
| locations[3].version | |
| locations[3].raw_type | |
| locations[3].license_id | |
| locations[3].is_accepted | False |
| locations[3].is_published | |
| locations[3].raw_source_name | International Conference on Learning Representations |
| locations[3].landing_page_url | https://www.openreview.net/pdf?id=S1xitgHtvS |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5027179922 |
| authorships[0].author.orcid | |
| authorships[0].author.display_name | Brendan O’Donoghue |
| authorships[0].countries | US |
| authorships[0].affiliations[0].institution_ids | https://openalex.org/I1291425158 |
| authorships[0].affiliations[0].raw_affiliation_string | Google (United States), Mountain View, United States |
| authorships[0].institutions[0].id | https://openalex.org/I1291425158 |
| authorships[0].institutions[0].ror | https://ror.org/00njsd438 |
| authorships[0].institutions[0].type | company |
| authorships[0].institutions[0].lineage | https://openalex.org/I1291425158, https://openalex.org/I4210128969 |
| authorships[0].institutions[0].country_code | US |
| authorships[0].institutions[0].display_name | Google (United States) |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Brendan O'Donoghue |
| authorships[0].is_corresponding | False |
| authorships[0].raw_affiliation_strings | Google (United States), Mountain View, United States |
| authorships[1].author.id | https://openalex.org/A5015899120 |
| authorships[1].author.orcid | |
| authorships[1].author.display_name | Ian Osband |
| authorships[1].countries | US |
| authorships[1].affiliations[0].institution_ids | https://openalex.org/I1291425158 |
| authorships[1].affiliations[0].raw_affiliation_string | Google (United States), Mountain View, United States |
| authorships[1].institutions[0].id | https://openalex.org/I1291425158 |
| authorships[1].institutions[0].ror | https://ror.org/00njsd438 |
| authorships[1].institutions[0].type | company |
| authorships[1].institutions[0].lineage | https://openalex.org/I1291425158, https://openalex.org/I4210128969 |
| authorships[1].institutions[0].country_code | US |
| authorships[1].institutions[0].display_name | Google (United States) |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Ian Osband |
| authorships[1].is_corresponding | False |
| authorships[1].raw_affiliation_strings | Google (United States), Mountain View, United States |
| authorships[2].author.id | https://openalex.org/A5046449484 |
| authorships[2].author.orcid | |
| authorships[2].author.display_name | Catalin Ionescu |
| authorships[2].countries | US |
| authorships[2].affiliations[0].institution_ids | https://openalex.org/I1291425158 |
| authorships[2].affiliations[0].raw_affiliation_string | Google (United States), Mountain View, United States |
| authorships[2].institutions[0].id | https://openalex.org/I1291425158 |
| authorships[2].institutions[0].ror | https://ror.org/00njsd438 |
| authorships[2].institutions[0].type | company |
| authorships[2].institutions[0].lineage | https://openalex.org/I1291425158, https://openalex.org/I4210128969 |
| authorships[2].institutions[0].country_code | US |
| authorships[2].institutions[0].display_name | Google (United States) |
| authorships[2].author_position | last |
| authorships[2].raw_author_name | Catalin Ionescu |
| authorships[2].is_corresponding | False |
| authorships[2].raw_affiliation_strings | Google (United States), Mountain View, United States |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2001.00805 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2019-12-26T00:00:00 |
| display_name | Making Sense of Reinforcement Learning and Probabilistic Inference |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10462 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9983999729156494 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1702 |
| primary_topic.subfield.display_name | Artificial Intelligence |
| primary_topic.display_name | Reinforcement Learning in Robotics |
| related_works | https://openalex.org/W2936107880, https://openalex.org/W3090106354, https://openalex.org/W3196528513, https://openalex.org/W3184258323, https://openalex.org/W3213789840, https://openalex.org/W3172461472, https://openalex.org/W3203592351, https://openalex.org/W2514775068, https://openalex.org/W3046755562, https://openalex.org/W3178256563, https://openalex.org/W2996148148, https://openalex.org/W3170914142, https://openalex.org/W2145957964, https://openalex.org/W3100936971, https://openalex.org/W1972336342, https://openalex.org/W2884559200, https://openalex.org/W2799151646, https://openalex.org/W2039522160, https://openalex.org/W1486341833, https://openalex.org/W3152815381 |
| cited_by_count | 12 |
| counts_by_year[0].year | 2023 |
| counts_by_year[0].cited_by_count | 1 |
| counts_by_year[1].year | 2022 |
| counts_by_year[1].cited_by_count | 2 |
| counts_by_year[2].year | 2021 |
| counts_by_year[2].cited_by_count | 3 |
| counts_by_year[3].year | 2020 |
| counts_by_year[3].cited_by_count | 5 |
| counts_by_year[4].year | 2018 |
| counts_by_year[4].cited_by_count | 1 |
| locations_count | 4 |
| best_oa_location.id | pmh:oai:arXiv.org:2001.00805 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2001.00805 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2001.00805 |
| primary_location.id | pmh:oai:arXiv.org:2001.00805 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2001.00805 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2001.00805 |
| publication_date | 2020-01-03 |
| publication_year | 2020 |
| referenced_works | https://openalex.org/W2107464055, https://openalex.org/W2967210407, https://openalex.org/W2974778612, https://openalex.org/W2039522160, https://openalex.org/W2964043796, https://openalex.org/W2046495522, https://openalex.org/W3125634603, https://openalex.org/W2296360731, https://openalex.org/W2963099933, https://openalex.org/W2093524643, https://openalex.org/W2145060720, https://openalex.org/W2020677283, https://openalex.org/W2108738385, https://openalex.org/W2518564545, https://openalex.org/W2963938771, https://openalex.org/W2167117957, https://openalex.org/W2257979135, https://openalex.org/W2951266961, https://openalex.org/W2123157758, https://openalex.org/W2594103415, https://openalex.org/W2963158178, https://openalex.org/W2107662876, https://openalex.org/W2106164082, https://openalex.org/W2964121744, https://openalex.org/W2489939061, https://openalex.org/W2799151646, https://openalex.org/W2962723954, https://openalex.org/W2098774185, https://openalex.org/W2145938889, https://openalex.org/W242065599, https://openalex.org/W1511986666, https://openalex.org/W2963170229, https://openalex.org/W2963884015, https://openalex.org/W2155772159, https://openalex.org/W1499669280, https://openalex.org/W1757796397, https://openalex.org/W1850488217, https://openalex.org/W2129670787, https://openalex.org/W3011120880, https://openalex.org/W2157477959, https://openalex.org/W2121863487, https://openalex.org/W2963751259, https://openalex.org/W2611591252, https://openalex.org/W2884559200, https://openalex.org/W2963438456, https://openalex.org/W2962767126 |
| referenced_works_count | 46 |
| abstract_inverted_index.A | 25 |
| abstract_inverted_index.a | 4, 36, 50, 135 |
| abstract_inverted_index.In | 71, 91 |
| abstract_inverted_index.RL | 42, 62, 74, 107 |
| abstract_inverted_index.We | 113 |
| abstract_inverted_index.an | 68, 73 |
| abstract_inverted_index.as | 32, 44, 67, 119 |
| abstract_inverted_index.be | 21, 64 |
| abstract_inverted_index.in | 53, 60, 125 |
| abstract_inverted_index.is | 101, 155 |
| abstract_inverted_index.of | 28, 80 |
| abstract_inverted_index.so | 104 |
| abstract_inverted_index.to | 16, 39, 111, 157 |
| abstract_inverted_index.we | 131, 149, 163 |
| abstract_inverted_index.Our | 47 |
| abstract_inverted_index.The | 10, 88 |
| abstract_inverted_index.`RL | 31, 118 |
| abstract_inverted_index.all | 92 |
| abstract_inverted_index.and | 34, 56, 86, 148 |
| abstract_inverted_index.are | 13 |
| abstract_inverted_index.but | 19, 93 |
| abstract_inverted_index.can | 20, 63, 122, 144 |
| abstract_inverted_index.its | 81 |
| abstract_inverted_index.key | 51 |
| abstract_inverted_index.not | 14 |
| abstract_inverted_index.the | 17, 41, 58, 78, 94, 98, 116, 138, 152, 158 |
| abstract_inverted_index.(RL) | 2 |
| abstract_inverted_index.cast | 66 |
| abstract_inverted_index.does | 140 |
| abstract_inverted_index.even | 126 |
| abstract_inverted_index.line | 27 |
| abstract_inverted_index.most | 95 |
| abstract_inverted_index.must | 76, 109 |
| abstract_inverted_index.show | 132, 150 |
| abstract_inverted_index.that | 54, 105, 115, 133, 143, 151 |
| abstract_inverted_index.upon | 83 |
| abstract_inverted_index.very | 127 |
| abstract_inverted_index.with | 7, 134, 166 |
| abstract_inverted_index.agent | 75 |
| abstract_inverted_index.basic | 128 |
| abstract_inverted_index.casts | 30 |
| abstract_inverted_index.known | 15 |
| abstract_inverted_index.paper | 48 |
| abstract_inverted_index.sense | 59 |
| abstract_inverted_index.small | 136 |
| abstract_inverted_index.well, | 147 |
| abstract_inverted_index.which | 61, 162 |
| abstract_inverted_index.yield | 141 |
| abstract_inverted_index.agent, | 18 |
| abstract_inverted_index.future | 84 |
| abstract_inverted_index.poorly | 124 |
| abstract_inverted_index.recent | 26 |
| abstract_inverted_index.resort | 110 |
| abstract_inverted_index.simple | 96 |
| abstract_inverted_index.system | 11 |
| abstract_inverted_index.actions | 82 |
| abstract_inverted_index.connect | 165 |
| abstract_inverted_index.control | 5 |
| abstract_inverted_index.effects | 79 |
| abstract_inverted_index.further | 164 |
| abstract_inverted_index.learned | 22 |
| abstract_inverted_index.perform | 123, 146 |
| abstract_inverted_index.popular | 117 |
| abstract_inverted_index.problem | 6, 43 |
| abstract_inverted_index.rewards | 85 |
| abstract_inverted_index.through | 23 |
| abstract_inverted_index.However, | 130 |
| abstract_inverted_index.Thompson | 167 |
| abstract_inverted_index.combines | 3 |
| abstract_inverted_index.consider | 77 |
| abstract_inverted_index.dynamics | 12 |
| abstract_inverted_index.learning | 1 |
| abstract_inverted_index.problem. | 70 |
| abstract_inverted_index.proposed | 160 |
| abstract_inverted_index.provably | 145 |
| abstract_inverted_index.recently | 159 |
| abstract_inverted_index.research | 29 |
| abstract_inverted_index.suggests | 35 |
| abstract_inverted_index.surfaces | 49 |
| abstract_inverted_index.algorithm | 154 |
| abstract_inverted_index.approach, | 55 |
| abstract_inverted_index.clarifies | 57 |
| abstract_inverted_index.framework | 38, 139 |
| abstract_inverted_index.inference | 69, 100 |
| abstract_inverted_index.practical | 106 |
| abstract_inverted_index.problems. | 129 |
| abstract_inverted_index.resulting | 99, 153 |
| abstract_inverted_index.sampling. | 168 |
| abstract_inverted_index.settings, | 97 |
| abstract_inverted_index.tradeoff. | 90 |
| abstract_inverted_index.algorithms | 108, 142 |
| abstract_inverted_index.coherently | 65 |
| abstract_inverted_index.equivalent | 156 |
| abstract_inverted_index.generalize | 40 |
| abstract_inverted_index.inference' | 33, 120 |
| abstract_inverted_index.inference. | 46 |
| abstract_inverted_index.particular | 37 |
| abstract_inverted_index.K-learning, | 161 |
| abstract_inverted_index.demonstrate | 114 |
| abstract_inverted_index.estimation: | 9 |
| abstract_inverted_index.experience. | 24 |
| abstract_inverted_index.intractable | 103 |
| abstract_inverted_index.particular, | 72 |
| abstract_inverted_index.shortcoming | 52 |
| abstract_inverted_index.statistical | 8 |
| abstract_inverted_index.modification | 137 |
| abstract_inverted_index.Reinforcement | 0 |
| abstract_inverted_index.approximation | 121 |
| abstract_inverted_index.observations: | 87 |
| abstract_inverted_index.probabilistic | 45 |
| abstract_inverted_index.approximation. | 112 |
| abstract_inverted_index.computationally | 102 |
| abstract_inverted_index.exploration-exploitation | 89 |
| cited_by_percentile_year | |
| countries_distinct_count | 1 |
| institutions_distinct_count | 3 |
| citation_normalized_percentile |