Stochastic matrix games with bandit feedback. Article Swipe
We study a version of the classical zero-sum matrix game with unknown payoff matrix and bandit feedback, where the players only observe each others actions and a noisy payoff. This generalizes the usual matrix game, where the payoff matrix is known to the players. Despite numerous applications, this problem has received relatively little attention. Although adversarial bandit algorithms achieve low regret, they do not exploit the matrix structure and perform poorly relative to the new algorithms. The main contributions are regret analyses of variants of UCB and K-learning that hold for any opponent, e.g., even when the opponent adversarially plays the best response to the learner's mixed strategy. Along the way, we show that Thompson fails catastrophically in this setting and provide empirical comparison to existing algorithms.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- https://ui.adsabs.harvard.edu/abs/2020arXiv200605145O/abstract
- OA Status
- green
- Cited By
- 4
- References
- 29
- Related Works
- 20
- OpenAlex ID
- https://openalex.org/W3035312163
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W3035312163Canonical identifier for this work in OpenAlex
- Title
-
Stochastic matrix games with bandit feedback.Work title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2020Year of publication
- Publication date
-
2020-06-09Full publication date if available
- Authors
-
Brendan O’Donoghue, Tor Lattimore, Ian OsbandList of authors in order
- Landing page
-
https://ui.adsabs.harvard.edu/abs/2020arXiv200605145O/abstractPublisher landing page
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://ui.adsabs.harvard.edu/abs/2020arXiv200605145O/abstractDirect OA link when available
- Concepts
-
Regret, Stochastic game, Fictitious play, Matrix (chemical analysis), Normal-form game, Computer science, Exploit, Adversary, Mathematical optimization, Mathematics, Mathematical economics, Artificial intelligence, Repeated game, Game theory, Nash equilibrium, Machine learning, Composite material, Materials science, Computer securityTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
4Total citation count in OpenAlex
- Citations by year (recent)
-
2022: 1, 2021: 3Per-year citation counts (last 5 years)
- References (count)
-
29Number of works referenced by this work
- Related works (count)
-
20Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W3035312163 |
|---|---|
| doi | |
| ids.mag | 3035312163 |
| ids.openalex | https://openalex.org/W3035312163 |
| fwci | |
| type | preprint |
| title | Stochastic matrix games with bandit feedback. |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T12101 |
| topics[0].field.id | https://openalex.org/fields/18 |
| topics[0].field.display_name | Decision Sciences |
| topics[0].score | 1.0 |
| topics[0].domain.id | https://openalex.org/domains/2 |
| topics[0].domain.display_name | Social Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1803 |
| topics[0].subfield.display_name | Management Science and Operations Research |
| topics[0].display_name | Advanced Bandit Algorithms Research |
| topics[1].id | https://openalex.org/T10462 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9916999936103821 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1702 |
| topics[1].subfield.display_name | Artificial Intelligence |
| topics[1].display_name | Reinforcement Learning in Robotics |
| topics[2].id | https://openalex.org/T12072 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.9904000163078308 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1702 |
| topics[2].subfield.display_name | Artificial Intelligence |
| topics[2].display_name | Machine Learning and Algorithms |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C50817715 |
| concepts[0].level | 2 |
| concepts[0].score | 0.8119308948516846 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q79895177 |
| concepts[0].display_name | Regret |
| concepts[1].id | https://openalex.org/C22171661 |
| concepts[1].level | 2 |
| concepts[1].score | 0.716268002986908 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q1074380 |
| concepts[1].display_name | Stochastic game |
| concepts[2].id | https://openalex.org/C145071142 |
| concepts[2].level | 3 |
| concepts[2].score | 0.622489333152771 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q1411116 |
| concepts[2].display_name | Fictitious play |
| concepts[3].id | https://openalex.org/C106487976 |
| concepts[3].level | 2 |
| concepts[3].score | 0.6147667169570923 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q685816 |
| concepts[3].display_name | Matrix (chemical analysis) |
| concepts[4].id | https://openalex.org/C155930848 |
| concepts[4].level | 4 |
| concepts[4].score | 0.5635859966278076 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q1069099 |
| concepts[4].display_name | Normal-form game |
| concepts[5].id | https://openalex.org/C41008148 |
| concepts[5].level | 0 |
| concepts[5].score | 0.5142723917961121 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[5].display_name | Computer science |
| concepts[6].id | https://openalex.org/C165696696 |
| concepts[6].level | 2 |
| concepts[6].score | 0.4896445870399475 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q11287 |
| concepts[6].display_name | Exploit |
| concepts[7].id | https://openalex.org/C41065033 |
| concepts[7].level | 2 |
| concepts[7].score | 0.4768832325935364 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q2825412 |
| concepts[7].display_name | Adversary |
| concepts[8].id | https://openalex.org/C126255220 |
| concepts[8].level | 1 |
| concepts[8].score | 0.445272296667099 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q141495 |
| concepts[8].display_name | Mathematical optimization |
| concepts[9].id | https://openalex.org/C33923547 |
| concepts[9].level | 0 |
| concepts[9].score | 0.3725767731666565 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q395 |
| concepts[9].display_name | Mathematics |
| concepts[10].id | https://openalex.org/C144237770 |
| concepts[10].level | 1 |
| concepts[10].score | 0.3591892123222351 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q747534 |
| concepts[10].display_name | Mathematical economics |
| concepts[11].id | https://openalex.org/C154945302 |
| concepts[11].level | 1 |
| concepts[11].score | 0.32801949977874756 |
| concepts[11].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[11].display_name | Artificial intelligence |
| concepts[12].id | https://openalex.org/C202556891 |
| concepts[12].level | 3 |
| concepts[12].score | 0.3065246343612671 |
| concepts[12].wikidata | https://www.wikidata.org/wiki/Q1584646 |
| concepts[12].display_name | Repeated game |
| concepts[13].id | https://openalex.org/C177142836 |
| concepts[13].level | 2 |
| concepts[13].score | 0.272982120513916 |
| concepts[13].wikidata | https://www.wikidata.org/wiki/Q44455 |
| concepts[13].display_name | Game theory |
| concepts[14].id | https://openalex.org/C46814582 |
| concepts[14].level | 2 |
| concepts[14].score | 0.2224770188331604 |
| concepts[14].wikidata | https://www.wikidata.org/wiki/Q23389 |
| concepts[14].display_name | Nash equilibrium |
| concepts[15].id | https://openalex.org/C119857082 |
| concepts[15].level | 1 |
| concepts[15].score | 0.16836926341056824 |
| concepts[15].wikidata | https://www.wikidata.org/wiki/Q2539 |
| concepts[15].display_name | Machine learning |
| concepts[16].id | https://openalex.org/C159985019 |
| concepts[16].level | 1 |
| concepts[16].score | 0.0 |
| concepts[16].wikidata | https://www.wikidata.org/wiki/Q181790 |
| concepts[16].display_name | Composite material |
| concepts[17].id | https://openalex.org/C192562407 |
| concepts[17].level | 0 |
| concepts[17].score | 0.0 |
| concepts[17].wikidata | https://www.wikidata.org/wiki/Q228736 |
| concepts[17].display_name | Materials science |
| concepts[18].id | https://openalex.org/C38652104 |
| concepts[18].level | 1 |
| concepts[18].score | 0.0 |
| concepts[18].wikidata | https://www.wikidata.org/wiki/Q3510521 |
| concepts[18].display_name | Computer security |
| keywords[0].id | https://openalex.org/keywords/regret |
| keywords[0].score | 0.8119308948516846 |
| keywords[0].display_name | Regret |
| keywords[1].id | https://openalex.org/keywords/stochastic-game |
| keywords[1].score | 0.716268002986908 |
| keywords[1].display_name | Stochastic game |
| keywords[2].id | https://openalex.org/keywords/fictitious-play |
| keywords[2].score | 0.622489333152771 |
| keywords[2].display_name | Fictitious play |
| keywords[3].id | https://openalex.org/keywords/matrix |
| keywords[3].score | 0.6147667169570923 |
| keywords[3].display_name | Matrix (chemical analysis) |
| keywords[4].id | https://openalex.org/keywords/normal-form-game |
| keywords[4].score | 0.5635859966278076 |
| keywords[4].display_name | Normal-form game |
| keywords[5].id | https://openalex.org/keywords/computer-science |
| keywords[5].score | 0.5142723917961121 |
| keywords[5].display_name | Computer science |
| keywords[6].id | https://openalex.org/keywords/exploit |
| keywords[6].score | 0.4896445870399475 |
| keywords[6].display_name | Exploit |
| keywords[7].id | https://openalex.org/keywords/adversary |
| keywords[7].score | 0.4768832325935364 |
| keywords[7].display_name | Adversary |
| keywords[8].id | https://openalex.org/keywords/mathematical-optimization |
| keywords[8].score | 0.445272296667099 |
| keywords[8].display_name | Mathematical optimization |
| keywords[9].id | https://openalex.org/keywords/mathematics |
| keywords[9].score | 0.3725767731666565 |
| keywords[9].display_name | Mathematics |
| keywords[10].id | https://openalex.org/keywords/mathematical-economics |
| keywords[10].score | 0.3591892123222351 |
| keywords[10].display_name | Mathematical economics |
| keywords[11].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[11].score | 0.32801949977874756 |
| keywords[11].display_name | Artificial intelligence |
| keywords[12].id | https://openalex.org/keywords/repeated-game |
| keywords[12].score | 0.3065246343612671 |
| keywords[12].display_name | Repeated game |
| keywords[13].id | https://openalex.org/keywords/game-theory |
| keywords[13].score | 0.272982120513916 |
| keywords[13].display_name | Game theory |
| keywords[14].id | https://openalex.org/keywords/nash-equilibrium |
| keywords[14].score | 0.2224770188331604 |
| keywords[14].display_name | Nash equilibrium |
| keywords[15].id | https://openalex.org/keywords/machine-learning |
| keywords[15].score | 0.16836926341056824 |
| keywords[15].display_name | Machine learning |
| language | en |
| locations[0].id | mag:3035312163 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | |
| locations[0].version | submittedVersion |
| locations[0].raw_type | |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | arXiv (Cornell University) |
| locations[0].landing_page_url | https://ui.adsabs.harvard.edu/abs/2020arXiv200605145O/abstract |
| authorships[0].author.id | https://openalex.org/A5027179922 |
| authorships[0].author.orcid | |
| authorships[0].author.display_name | Brendan O’Donoghue |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Brendan O'Donoghue |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5017031238 |
| authorships[1].author.orcid | |
| authorships[1].author.display_name | Tor Lattimore |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Tor Lattimore |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5015899120 |
| authorships[2].author.orcid | |
| authorships[2].author.display_name | Ian Osband |
| authorships[2].author_position | last |
| authorships[2].raw_author_name | Ian Osband |
| authorships[2].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://ui.adsabs.harvard.edu/abs/2020arXiv200605145O/abstract |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2020-06-19T00:00:00 |
| display_name | Stochastic matrix games with bandit feedback. |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-10-10T17:16:08.811792 |
| primary_topic.id | https://openalex.org/T12101 |
| primary_topic.field.id | https://openalex.org/fields/18 |
| primary_topic.field.display_name | Decision Sciences |
| primary_topic.score | 1.0 |
| primary_topic.domain.id | https://openalex.org/domains/2 |
| primary_topic.domain.display_name | Social Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1803 |
| primary_topic.subfield.display_name | Management Science and Operations Research |
| primary_topic.display_name | Advanced Bandit Algorithms Research |
| related_works | https://openalex.org/W3168278989, https://openalex.org/W2752505153, https://openalex.org/W3129266365, https://openalex.org/W2952551823, https://openalex.org/W3200627097, https://openalex.org/W3000234344, https://openalex.org/W3159696719, https://openalex.org/W2964295480, https://openalex.org/W1826788, https://openalex.org/W2289468512, https://openalex.org/W3120693675, https://openalex.org/W1560033549, https://openalex.org/W1967250398, https://openalex.org/W2913912720, https://openalex.org/W1916836981, https://openalex.org/W2105918522, https://openalex.org/W2949909914, https://openalex.org/W3017394432, https://openalex.org/W3196517148, https://openalex.org/W2141744833 |
| cited_by_count | 4 |
| counts_by_year[0].year | 2022 |
| counts_by_year[0].cited_by_count | 1 |
| counts_by_year[1].year | 2021 |
| counts_by_year[1].cited_by_count | 3 |
| locations_count | 1 |
| best_oa_location.id | mag:3035312163 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | arXiv (Cornell University) |
| best_oa_location.landing_page_url | https://ui.adsabs.harvard.edu/abs/2020arXiv200605145O/abstract |
| primary_location.id | mag:3035312163 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | |
| primary_location.version | submittedVersion |
| primary_location.raw_type | |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | arXiv (Cornell University) |
| primary_location.landing_page_url | https://ui.adsabs.harvard.edu/abs/2020arXiv200605145O/abstract |
| publication_date | 2020-06-09 |
| publication_year | 2020 |
| referenced_works | https://openalex.org/W2100415632, https://openalex.org/W2996251520, https://openalex.org/W2962901934, https://openalex.org/W2129516068, https://openalex.org/W2963007936, https://openalex.org/W2951371685, https://openalex.org/W2945871487, https://openalex.org/W2884559200, https://openalex.org/W2077902449, https://openalex.org/W2075567596, https://openalex.org/W610661469, https://openalex.org/W2153129425, https://openalex.org/W2288174618, https://openalex.org/W3124603229, https://openalex.org/W2067050450, https://openalex.org/W2149721706, https://openalex.org/W1973885534, https://openalex.org/W2039522160, https://openalex.org/W2783292962, https://openalex.org/W2971977212, https://openalex.org/W2913950235, https://openalex.org/W2755611049, https://openalex.org/W2116067849, https://openalex.org/W2957302152, https://openalex.org/W2168405694, https://openalex.org/W2964014706, https://openalex.org/W1838526237, https://openalex.org/W2625705959, https://openalex.org/W1510812122 |
| referenced_works_count | 29 |
| abstract_inverted_index.a | 2, 26 |
| abstract_inverted_index.We | 0 |
| abstract_inverted_index.do | 62 |
| abstract_inverted_index.in | 117 |
| abstract_inverted_index.is | 39 |
| abstract_inverted_index.of | 4, 82, 84 |
| abstract_inverted_index.to | 41, 72, 103, 124 |
| abstract_inverted_index.we | 111 |
| abstract_inverted_index.The | 76 |
| abstract_inverted_index.UCB | 85 |
| abstract_inverted_index.and | 14, 25, 68, 86, 120 |
| abstract_inverted_index.any | 91 |
| abstract_inverted_index.are | 79 |
| abstract_inverted_index.for | 90 |
| abstract_inverted_index.has | 49 |
| abstract_inverted_index.low | 59 |
| abstract_inverted_index.new | 74 |
| abstract_inverted_index.not | 63 |
| abstract_inverted_index.the | 5, 18, 31, 36, 42, 65, 73, 96, 100, 104, 109 |
| abstract_inverted_index.This | 29 |
| abstract_inverted_index.best | 101 |
| abstract_inverted_index.each | 22 |
| abstract_inverted_index.even | 94 |
| abstract_inverted_index.game | 9 |
| abstract_inverted_index.hold | 89 |
| abstract_inverted_index.main | 77 |
| abstract_inverted_index.only | 20 |
| abstract_inverted_index.show | 112 |
| abstract_inverted_index.that | 88, 113 |
| abstract_inverted_index.they | 61 |
| abstract_inverted_index.this | 47, 118 |
| abstract_inverted_index.way, | 110 |
| abstract_inverted_index.when | 95 |
| abstract_inverted_index.with | 10 |
| abstract_inverted_index.Along | 108 |
| abstract_inverted_index.e.g., | 93 |
| abstract_inverted_index.fails | 115 |
| abstract_inverted_index.game, | 34 |
| abstract_inverted_index.known | 40 |
| abstract_inverted_index.mixed | 106 |
| abstract_inverted_index.noisy | 27 |
| abstract_inverted_index.plays | 99 |
| abstract_inverted_index.study | 1 |
| abstract_inverted_index.usual | 32 |
| abstract_inverted_index.where | 17, 35 |
| abstract_inverted_index.bandit | 15, 56 |
| abstract_inverted_index.little | 52 |
| abstract_inverted_index.matrix | 8, 13, 33, 38, 66 |
| abstract_inverted_index.others | 23 |
| abstract_inverted_index.payoff | 12, 37 |
| abstract_inverted_index.poorly | 70 |
| abstract_inverted_index.regret | 80 |
| abstract_inverted_index.Despite | 44 |
| abstract_inverted_index.achieve | 58 |
| abstract_inverted_index.actions | 24 |
| abstract_inverted_index.exploit | 64 |
| abstract_inverted_index.observe | 21 |
| abstract_inverted_index.payoff. | 28 |
| abstract_inverted_index.perform | 69 |
| abstract_inverted_index.players | 19 |
| abstract_inverted_index.problem | 48 |
| abstract_inverted_index.provide | 121 |
| abstract_inverted_index.regret, | 60 |
| abstract_inverted_index.setting | 119 |
| abstract_inverted_index.unknown | 11 |
| abstract_inverted_index.version | 3 |
| abstract_inverted_index.Although | 54 |
| abstract_inverted_index.Thompson | 114 |
| abstract_inverted_index.analyses | 81 |
| abstract_inverted_index.existing | 125 |
| abstract_inverted_index.numerous | 45 |
| abstract_inverted_index.opponent | 97 |
| abstract_inverted_index.players. | 43 |
| abstract_inverted_index.received | 50 |
| abstract_inverted_index.relative | 71 |
| abstract_inverted_index.response | 102 |
| abstract_inverted_index.variants | 83 |
| abstract_inverted_index.zero-sum | 7 |
| abstract_inverted_index.classical | 6 |
| abstract_inverted_index.empirical | 122 |
| abstract_inverted_index.feedback, | 16 |
| abstract_inverted_index.learner's | 105 |
| abstract_inverted_index.opponent, | 92 |
| abstract_inverted_index.strategy. | 107 |
| abstract_inverted_index.structure | 67 |
| abstract_inverted_index.K-learning | 87 |
| abstract_inverted_index.algorithms | 57 |
| abstract_inverted_index.attention. | 53 |
| abstract_inverted_index.comparison | 123 |
| abstract_inverted_index.relatively | 51 |
| abstract_inverted_index.adversarial | 55 |
| abstract_inverted_index.algorithms. | 75, 126 |
| abstract_inverted_index.generalizes | 30 |
| abstract_inverted_index.adversarially | 98 |
| abstract_inverted_index.applications, | 46 |
| abstract_inverted_index.contributions | 78 |
| abstract_inverted_index.catastrophically | 116 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 3 |
| citation_normalized_percentile |