Curriculum Learning for Cooperation in Multi-Agent Reinforcement Learning Article Swipe
YOU?
·
· 2023
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2312.11768
While there has been significant progress in curriculum learning and continuous learning for training agents to generalize across a wide variety of environments in the context of single-agent reinforcement learning, it is unclear if these algorithms would still be valid in a multi-agent setting. In a competitive setting, a learning agent can be trained by making it compete with a curriculum of increasingly skilled opponents. However, a general intelligent agent should also be able to learn to act around other agents and cooperate with them to achieve common goals. When cooperating with other agents, the learning agent must (a) learn how to perform the task (or subtask), and (b) increase the overall team reward. In this paper, we aim to answer the question of what kind of cooperative teammate, and a curriculum of teammates should a learning agent be trained with to achieve these two objectives. Our results on the game Overcooked show that a pre-trained teammate who is less skilled is the best teammate for overall team reward but the worst for the learning of the agent. Moreover, somewhat surprisingly, a curriculum of teammates with decreasing skill levels performs better than other types of curricula.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2312.11768
- https://arxiv.org/pdf/2312.11768
- OA Status
- green
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4390041669
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4390041669Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2312.11768Digital Object Identifier
- Title
-
Curriculum Learning for Cooperation in Multi-Agent Reinforcement LearningWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2023Year of publication
- Publication date
-
2023-12-19Full publication date if available
- Authors
-
Rupali Bhati, Sai Krishna Gottipati, Clodéric Mars, Matthew E. TaylorList of authors in order
- Landing page
-
https://arxiv.org/abs/2312.11768Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2312.11768Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2312.11768Direct OA link when available
- Concepts
-
Reinforcement learning, Curriculum, Variety (cybernetics), Task (project management), Computer science, Context (archaeology), Artificial intelligence, Error-driven learning, Knowledge management, Psychology, Engineering, Pedagogy, Biology, Systems engineering, PaleontologyTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4390041669 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2312.11768 |
| ids.doi | https://doi.org/10.48550/arxiv.2312.11768 |
| ids.openalex | https://openalex.org/W4390041669 |
| fwci | |
| type | preprint |
| title | Curriculum Learning for Cooperation in Multi-Agent Reinforcement Learning |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10646 |
| topics[0].field.id | https://openalex.org/fields/33 |
| topics[0].field.display_name | Social Sciences |
| topics[0].score | 0.9750000238418579 |
| topics[0].domain.id | https://openalex.org/domains/2 |
| topics[0].domain.display_name | Social Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/3311 |
| topics[0].subfield.display_name | Safety Research |
| topics[0].display_name | Experimental Behavioral Economics Studies |
| topics[1].id | https://openalex.org/T10462 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.953499972820282 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1702 |
| topics[1].subfield.display_name | Artificial Intelligence |
| topics[1].display_name | Reinforcement Learning in Robotics |
| topics[2].id | https://openalex.org/T11182 |
| topics[2].field.id | https://openalex.org/fields/18 |
| topics[2].field.display_name | Decision Sciences |
| topics[2].score | 0.9521999955177307 |
| topics[2].domain.id | https://openalex.org/domains/2 |
| topics[2].domain.display_name | Social Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1803 |
| topics[2].subfield.display_name | Management Science and Operations Research |
| topics[2].display_name | Auction Theory and Applications |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C97541855 |
| concepts[0].level | 2 |
| concepts[0].score | 0.7798224687576294 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q830687 |
| concepts[0].display_name | Reinforcement learning |
| concepts[1].id | https://openalex.org/C47177190 |
| concepts[1].level | 2 |
| concepts[1].score | 0.7769895195960999 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q207137 |
| concepts[1].display_name | Curriculum |
| concepts[2].id | https://openalex.org/C136197465 |
| concepts[2].level | 2 |
| concepts[2].score | 0.7060532569885254 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q1729295 |
| concepts[2].display_name | Variety (cybernetics) |
| concepts[3].id | https://openalex.org/C2780451532 |
| concepts[3].level | 2 |
| concepts[3].score | 0.6878733038902283 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q759676 |
| concepts[3].display_name | Task (project management) |
| concepts[4].id | https://openalex.org/C41008148 |
| concepts[4].level | 0 |
| concepts[4].score | 0.6714663505554199 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[4].display_name | Computer science |
| concepts[5].id | https://openalex.org/C2779343474 |
| concepts[5].level | 2 |
| concepts[5].score | 0.6553894281387329 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q3109175 |
| concepts[5].display_name | Context (archaeology) |
| concepts[6].id | https://openalex.org/C154945302 |
| concepts[6].level | 1 |
| concepts[6].score | 0.46738922595977783 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[6].display_name | Artificial intelligence |
| concepts[7].id | https://openalex.org/C47932503 |
| concepts[7].level | 3 |
| concepts[7].score | 0.4230459928512573 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q5395689 |
| concepts[7].display_name | Error-driven learning |
| concepts[8].id | https://openalex.org/C56739046 |
| concepts[8].level | 1 |
| concepts[8].score | 0.4024602174758911 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q192060 |
| concepts[8].display_name | Knowledge management |
| concepts[9].id | https://openalex.org/C15744967 |
| concepts[9].level | 0 |
| concepts[9].score | 0.16905486583709717 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q9418 |
| concepts[9].display_name | Psychology |
| concepts[10].id | https://openalex.org/C127413603 |
| concepts[10].level | 0 |
| concepts[10].score | 0.11114910244941711 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q11023 |
| concepts[10].display_name | Engineering |
| concepts[11].id | https://openalex.org/C19417346 |
| concepts[11].level | 1 |
| concepts[11].score | 0.09985002875328064 |
| concepts[11].wikidata | https://www.wikidata.org/wiki/Q7922 |
| concepts[11].display_name | Pedagogy |
| concepts[12].id | https://openalex.org/C86803240 |
| concepts[12].level | 0 |
| concepts[12].score | 0.0 |
| concepts[12].wikidata | https://www.wikidata.org/wiki/Q420 |
| concepts[12].display_name | Biology |
| concepts[13].id | https://openalex.org/C201995342 |
| concepts[13].level | 1 |
| concepts[13].score | 0.0 |
| concepts[13].wikidata | https://www.wikidata.org/wiki/Q682496 |
| concepts[13].display_name | Systems engineering |
| concepts[14].id | https://openalex.org/C151730666 |
| concepts[14].level | 1 |
| concepts[14].score | 0.0 |
| concepts[14].wikidata | https://www.wikidata.org/wiki/Q7205 |
| concepts[14].display_name | Paleontology |
| keywords[0].id | https://openalex.org/keywords/reinforcement-learning |
| keywords[0].score | 0.7798224687576294 |
| keywords[0].display_name | Reinforcement learning |
| keywords[1].id | https://openalex.org/keywords/curriculum |
| keywords[1].score | 0.7769895195960999 |
| keywords[1].display_name | Curriculum |
| keywords[2].id | https://openalex.org/keywords/variety |
| keywords[2].score | 0.7060532569885254 |
| keywords[2].display_name | Variety (cybernetics) |
| keywords[3].id | https://openalex.org/keywords/task |
| keywords[3].score | 0.6878733038902283 |
| keywords[3].display_name | Task (project management) |
| keywords[4].id | https://openalex.org/keywords/computer-science |
| keywords[4].score | 0.6714663505554199 |
| keywords[4].display_name | Computer science |
| keywords[5].id | https://openalex.org/keywords/context |
| keywords[5].score | 0.6553894281387329 |
| keywords[5].display_name | Context (archaeology) |
| keywords[6].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[6].score | 0.46738922595977783 |
| keywords[6].display_name | Artificial intelligence |
| keywords[7].id | https://openalex.org/keywords/error-driven-learning |
| keywords[7].score | 0.4230459928512573 |
| keywords[7].display_name | Error-driven learning |
| keywords[8].id | https://openalex.org/keywords/knowledge-management |
| keywords[8].score | 0.4024602174758911 |
| keywords[8].display_name | Knowledge management |
| keywords[9].id | https://openalex.org/keywords/psychology |
| keywords[9].score | 0.16905486583709717 |
| keywords[9].display_name | Psychology |
| keywords[10].id | https://openalex.org/keywords/engineering |
| keywords[10].score | 0.11114910244941711 |
| keywords[10].display_name | Engineering |
| keywords[11].id | https://openalex.org/keywords/pedagogy |
| keywords[11].score | 0.09985002875328064 |
| keywords[11].display_name | Pedagogy |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2312.11768 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2312.11768 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2312.11768 |
| locations[1].id | doi:10.48550/arxiv.2312.11768 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2312.11768 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5025654535 |
| authorships[0].author.orcid | |
| authorships[0].author.display_name | Rupali Bhati |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Bhati, Rupali |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5084915709 |
| authorships[1].author.orcid | |
| authorships[1].author.display_name | Sai Krishna Gottipati |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Gottipati, Sai Krishna |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5049138810 |
| authorships[2].author.orcid | |
| authorships[2].author.display_name | Clodéric Mars |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Mars, Clodéric |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5070914351 |
| authorships[3].author.orcid | https://orcid.org/0000-0001-8946-0211 |
| authorships[3].author.display_name | Matthew E. Taylor |
| authorships[3].author_position | last |
| authorships[3].raw_author_name | Taylor, Matthew E. |
| authorships[3].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2312.11768 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Curriculum Learning for Cooperation in Multi-Agent Reinforcement Learning |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10646 |
| primary_topic.field.id | https://openalex.org/fields/33 |
| primary_topic.field.display_name | Social Sciences |
| primary_topic.score | 0.9750000238418579 |
| primary_topic.domain.id | https://openalex.org/domains/2 |
| primary_topic.domain.display_name | Social Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/3311 |
| primary_topic.subfield.display_name | Safety Research |
| primary_topic.display_name | Experimental Behavioral Economics Studies |
| related_works | https://openalex.org/W2371091044, https://openalex.org/W2171010636, https://openalex.org/W87513465, https://openalex.org/W2391666574, https://openalex.org/W2786230833, https://openalex.org/W3203256658, https://openalex.org/W2352650970, https://openalex.org/W1544514152, https://openalex.org/W1493952344, https://openalex.org/W4312372616 |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2312.11768 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2312.11768 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2312.11768 |
| primary_location.id | pmh:oai:arXiv.org:2312.11768 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2312.11768 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2312.11768 |
| publication_date | 2023-12-19 |
| publication_year | 2023 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 18, 41, 45, 48, 59, 66, 130, 135, 154, 181 |
| abstract_inverted_index.In | 44, 114 |
| abstract_inverted_index.be | 38, 52, 72, 138 |
| abstract_inverted_index.by | 54 |
| abstract_inverted_index.if | 33 |
| abstract_inverted_index.in | 6, 23, 40 |
| abstract_inverted_index.is | 31, 158, 161 |
| abstract_inverted_index.it | 30, 56 |
| abstract_inverted_index.of | 21, 26, 61, 123, 126, 132, 175, 183, 194 |
| abstract_inverted_index.on | 148 |
| abstract_inverted_index.to | 15, 74, 76, 85, 101, 119, 141 |
| abstract_inverted_index.we | 117 |
| abstract_inverted_index.(a) | 98 |
| abstract_inverted_index.(b) | 108 |
| abstract_inverted_index.(or | 105 |
| abstract_inverted_index.Our | 146 |
| abstract_inverted_index.act | 77 |
| abstract_inverted_index.aim | 118 |
| abstract_inverted_index.and | 9, 81, 107, 129 |
| abstract_inverted_index.but | 169 |
| abstract_inverted_index.can | 51 |
| abstract_inverted_index.for | 12, 165, 172 |
| abstract_inverted_index.has | 2 |
| abstract_inverted_index.how | 100 |
| abstract_inverted_index.the | 24, 94, 103, 110, 121, 149, 162, 170, 173, 176 |
| abstract_inverted_index.two | 144 |
| abstract_inverted_index.who | 157 |
| abstract_inverted_index.When | 89 |
| abstract_inverted_index.able | 73 |
| abstract_inverted_index.also | 71 |
| abstract_inverted_index.been | 3 |
| abstract_inverted_index.best | 163 |
| abstract_inverted_index.game | 150 |
| abstract_inverted_index.kind | 125 |
| abstract_inverted_index.less | 159 |
| abstract_inverted_index.must | 97 |
| abstract_inverted_index.show | 152 |
| abstract_inverted_index.task | 104 |
| abstract_inverted_index.team | 112, 167 |
| abstract_inverted_index.than | 191 |
| abstract_inverted_index.that | 153 |
| abstract_inverted_index.them | 84 |
| abstract_inverted_index.this | 115 |
| abstract_inverted_index.what | 124 |
| abstract_inverted_index.wide | 19 |
| abstract_inverted_index.with | 58, 83, 91, 140, 185 |
| abstract_inverted_index.While | 0 |
| abstract_inverted_index.agent | 50, 69, 96, 137 |
| abstract_inverted_index.learn | 75, 99 |
| abstract_inverted_index.other | 79, 92, 192 |
| abstract_inverted_index.skill | 187 |
| abstract_inverted_index.still | 37 |
| abstract_inverted_index.there | 1 |
| abstract_inverted_index.these | 34, 143 |
| abstract_inverted_index.types | 193 |
| abstract_inverted_index.valid | 39 |
| abstract_inverted_index.worst | 171 |
| abstract_inverted_index.would | 36 |
| abstract_inverted_index.across | 17 |
| abstract_inverted_index.agent. | 177 |
| abstract_inverted_index.agents | 14, 80 |
| abstract_inverted_index.answer | 120 |
| abstract_inverted_index.around | 78 |
| abstract_inverted_index.better | 190 |
| abstract_inverted_index.common | 87 |
| abstract_inverted_index.goals. | 88 |
| abstract_inverted_index.levels | 188 |
| abstract_inverted_index.making | 55 |
| abstract_inverted_index.paper, | 116 |
| abstract_inverted_index.reward | 168 |
| abstract_inverted_index.should | 70, 134 |
| abstract_inverted_index.achieve | 86, 142 |
| abstract_inverted_index.agents, | 93 |
| abstract_inverted_index.compete | 57 |
| abstract_inverted_index.context | 25 |
| abstract_inverted_index.general | 67 |
| abstract_inverted_index.overall | 111, 166 |
| abstract_inverted_index.perform | 102 |
| abstract_inverted_index.results | 147 |
| abstract_inverted_index.reward. | 113 |
| abstract_inverted_index.skilled | 63, 160 |
| abstract_inverted_index.trained | 53, 139 |
| abstract_inverted_index.unclear | 32 |
| abstract_inverted_index.variety | 20 |
| abstract_inverted_index.However, | 65 |
| abstract_inverted_index.increase | 109 |
| abstract_inverted_index.learning | 8, 11, 49, 95, 136, 174 |
| abstract_inverted_index.performs | 189 |
| abstract_inverted_index.progress | 5 |
| abstract_inverted_index.question | 122 |
| abstract_inverted_index.setting, | 47 |
| abstract_inverted_index.setting. | 43 |
| abstract_inverted_index.somewhat | 179 |
| abstract_inverted_index.teammate | 156, 164 |
| abstract_inverted_index.training | 13 |
| abstract_inverted_index.Moreover, | 178 |
| abstract_inverted_index.cooperate | 82 |
| abstract_inverted_index.learning, | 29 |
| abstract_inverted_index.subtask), | 106 |
| abstract_inverted_index.teammate, | 128 |
| abstract_inverted_index.teammates | 133, 184 |
| abstract_inverted_index.Overcooked | 151 |
| abstract_inverted_index.algorithms | 35 |
| abstract_inverted_index.continuous | 10 |
| abstract_inverted_index.curricula. | 195 |
| abstract_inverted_index.curriculum | 7, 60, 131, 182 |
| abstract_inverted_index.decreasing | 186 |
| abstract_inverted_index.generalize | 16 |
| abstract_inverted_index.opponents. | 64 |
| abstract_inverted_index.competitive | 46 |
| abstract_inverted_index.cooperating | 90 |
| abstract_inverted_index.cooperative | 127 |
| abstract_inverted_index.intelligent | 68 |
| abstract_inverted_index.multi-agent | 42 |
| abstract_inverted_index.objectives. | 145 |
| abstract_inverted_index.pre-trained | 155 |
| abstract_inverted_index.significant | 4 |
| abstract_inverted_index.environments | 22 |
| abstract_inverted_index.increasingly | 62 |
| abstract_inverted_index.single-agent | 27 |
| abstract_inverted_index.reinforcement | 28 |
| abstract_inverted_index.surprisingly, | 180 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 4 |
| citation_normalized_percentile |