Curriculum Learning for Cooperation in Multi-Agent Reinforcement Learning Article Swipe

PDF

Rupali Bhati , Sai Krishna Gottipati , Clodéric Mars , Matthew E. Taylor ·

YOU? · · 2023 · Open Access · · DOI: https://doi.org/10.48550/arxiv.2312.11768

While there has been significant progress in curriculum learning and continuous learning for training agents to generalize across a wide variety of environments in the context of single-agent reinforcement learning, it is unclear if these algorithms would still be valid in a multi-agent setting. In a competitive setting, a learning agent can be trained by making it compete with a curriculum of increasingly skilled opponents. However, a general intelligent agent should also be able to learn to act around other agents and cooperate with them to achieve common goals. When cooperating with other agents, the learning agent must (a) learn how to perform the task (or subtask), and (b) increase the overall team reward. In this paper, we aim to answer the question of what kind of cooperative teammate, and a curriculum of teammates should a learning agent be trained with to achieve these two objectives. Our results on the game Overcooked show that a pre-trained teammate who is less skilled is the best teammate for overall team reward but the worst for the learning of the agent. Moreover, somewhat surprisingly, a curriculum of teammates with decreasing skill levels performs better than other types of curricula.

Related Topics

Reinforcement Learning

Curriculum

Computer Science

Artificial Intelligence

Concepts

Reinforcement learning Curriculum Variety (cybernetics) Task (project management) Computer science Context (archaeology) Artificial intelligence Error-driven learning Knowledge management Psychology Engineering Pedagogy Biology Systems engineering Paleontology

Metadata

Type: preprint
Language: en
Landing Page: http://arxiv.org/abs/2312.11768
PDF: https://arxiv.org/pdf/2312.11768
OA Status: green
Related Works: 10
OpenAlex ID: https://openalex.org/W4390041669

All OpenAlex metadata

Raw OpenAlex JSON

OpenAlex ID: https://openalex.org/W4390041669

Canonical identifier for this work in OpenAlex
DOI: https://doi.org/10.48550/arxiv.2312.11768

Digital Object Identifier
Title: Curriculum Learning for Cooperation in Multi-Agent Reinforcement Learning

Work title
Type: preprint

OpenAlex work type
Language: en

Primary language
Publication year: 2023

Year of publication
Publication date: 2023-12-19

Full publication date if available
Authors: Rupali Bhati, Sai Krishna Gottipati, Clodéric Mars, Matthew E. Taylor

List of authors in order
Landing page: https://arxiv.org/abs/2312.11768

Publisher landing page
PDF URL: https://arxiv.org/pdf/2312.11768

Direct link to full text PDF
Open access: Yes

Whether a free full text is available
OA status: green

Open access status per OpenAlex
OA URL: https://arxiv.org/pdf/2312.11768

Direct OA link when available
Concepts: Reinforcement learning, Curriculum, Variety (cybernetics), Task (project management), Computer science, Context (archaeology), Artificial intelligence, Error-driven learning, Knowledge management, Psychology, Engineering, Pedagogy, Biology, Systems engineering, Paleontology

Top concepts (fields/topics) attached by OpenAlex
Cited by: 0

Total citation count in OpenAlex
Related works (count): 10

Other works algorithmically related by OpenAlex

Full payload

id	https://openalex.org/W4390041669
doi	https://doi.org/10.48550/arxiv.2312.11768
ids.doi	https://doi.org/10.48550/arxiv.2312.11768
ids.openalex	https://openalex.org/W4390041669
fwci
type	preprint
title	Curriculum Learning for Cooperation in Multi-Agent Reinforcement Learning
biblio.issue
biblio.volume
biblio.last_page
biblio.first_page
topics[0].id	https://openalex.org/T10646
topics[0].field.id	https://openalex.org/fields/33
topics[0].field.display_name	Social Sciences
topics[0].score	0.9750000238418579
topics[0].domain.id	https://openalex.org/domains/2
topics[0].domain.display_name	Social Sciences
topics[0].subfield.id	https://openalex.org/subfields/3311
topics[0].subfield.display_name	Safety Research
topics[0].display_name	Experimental Behavioral Economics Studies
topics[1].id	https://openalex.org/T10462
topics[1].field.id	https://openalex.org/fields/17
topics[1].field.display_name	Computer Science
topics[1].score	0.953499972820282
topics[1].domain.id	https://openalex.org/domains/3
topics[1].domain.display_name	Physical Sciences
topics[1].subfield.id	https://openalex.org/subfields/1702
topics[1].subfield.display_name	Artificial Intelligence
topics[1].display_name	Reinforcement Learning in Robotics
topics[2].id	https://openalex.org/T11182
topics[2].field.id	https://openalex.org/fields/18
topics[2].field.display_name	Decision Sciences
topics[2].score	0.9521999955177307
topics[2].domain.id	https://openalex.org/domains/2
topics[2].domain.display_name	Social Sciences
topics[2].subfield.id	https://openalex.org/subfields/1803
topics[2].subfield.display_name	Management Science and Operations Research
topics[2].display_name	Auction Theory and Applications
is_xpac	False
apc_list
apc_paid
concepts[0].id	https://openalex.org/C97541855
concepts[0].level	2
concepts[0].score	0.7798224687576294
concepts[0].wikidata	https://www.wikidata.org/wiki/Q830687
concepts[0].display_name	Reinforcement learning
concepts[1].id	https://openalex.org/C47177190
concepts[1].level	2
concepts[1].score	0.7769895195960999
concepts[1].wikidata	https://www.wikidata.org/wiki/Q207137
concepts[1].display_name	Curriculum
concepts[2].id	https://openalex.org/C136197465
concepts[2].level	2
concepts[2].score	0.7060532569885254
concepts[2].wikidata	https://www.wikidata.org/wiki/Q1729295
concepts[2].display_name	Variety (cybernetics)
concepts[3].id	https://openalex.org/C2780451532
concepts[3].level	2
concepts[3].score	0.6878733038902283
concepts[3].wikidata	https://www.wikidata.org/wiki/Q759676
concepts[3].display_name	Task (project management)
concepts[4].id	https://openalex.org/C41008148
concepts[4].level	0
concepts[4].score	0.6714663505554199
concepts[4].wikidata	https://www.wikidata.org/wiki/Q21198
concepts[4].display_name	Computer science
concepts[5].id	https://openalex.org/C2779343474
concepts[5].level	2
concepts[5].score	0.6553894281387329
concepts[5].wikidata	https://www.wikidata.org/wiki/Q3109175
concepts[5].display_name	Context (archaeology)
concepts[6].id	https://openalex.org/C154945302
concepts[6].level	1
concepts[6].score	0.46738922595977783
concepts[6].wikidata	https://www.wikidata.org/wiki/Q11660
concepts[6].display_name	Artificial intelligence
concepts[7].id	https://openalex.org/C47932503
concepts[7].level	3
concepts[7].score	0.4230459928512573
concepts[7].wikidata	https://www.wikidata.org/wiki/Q5395689
concepts[7].display_name	Error-driven learning
concepts[8].id	https://openalex.org/C56739046
concepts[8].level	1
concepts[8].score	0.4024602174758911
concepts[8].wikidata	https://www.wikidata.org/wiki/Q192060
concepts[8].display_name	Knowledge management
concepts[9].id	https://openalex.org/C15744967
concepts[9].level	0
concepts[9].score	0.16905486583709717
concepts[9].wikidata	https://www.wikidata.org/wiki/Q9418
concepts[9].display_name	Psychology
concepts[10].id	https://openalex.org/C127413603
concepts[10].level	0
concepts[10].score	0.11114910244941711
concepts[10].wikidata	https://www.wikidata.org/wiki/Q11023
concepts[10].display_name	Engineering
concepts[11].id	https://openalex.org/C19417346
concepts[11].level	1
concepts[11].score	0.09985002875328064
concepts[11].wikidata	https://www.wikidata.org/wiki/Q7922
concepts[11].display_name	Pedagogy
concepts[12].id	https://openalex.org/C86803240
concepts[12].level	0
concepts[12].score	0.0
concepts[12].wikidata	https://www.wikidata.org/wiki/Q420
concepts[12].display_name	Biology
concepts[13].id	https://openalex.org/C201995342
concepts[13].level	1
concepts[13].score	0.0
concepts[13].wikidata	https://www.wikidata.org/wiki/Q682496
concepts[13].display_name	Systems engineering
concepts[14].id	https://openalex.org/C151730666
concepts[14].level	1
concepts[14].score	0.0
concepts[14].wikidata	https://www.wikidata.org/wiki/Q7205
concepts[14].display_name	Paleontology
keywords[0].id	https://openalex.org/keywords/reinforcement-learning
keywords[0].score	0.7798224687576294
keywords[0].display_name	Reinforcement learning
keywords[1].id	https://openalex.org/keywords/curriculum
keywords[1].score	0.7769895195960999
keywords[1].display_name	Curriculum
keywords[2].id	https://openalex.org/keywords/variety
keywords[2].score	0.7060532569885254
keywords[2].display_name	Variety (cybernetics)
keywords[3].id	https://openalex.org/keywords/task
keywords[3].score	0.6878733038902283
keywords[3].display_name	Task (project management)
keywords[4].id	https://openalex.org/keywords/computer-science
keywords[4].score	0.6714663505554199
keywords[4].display_name	Computer science
keywords[5].id	https://openalex.org/keywords/context
keywords[5].score	0.6553894281387329
keywords[5].display_name	Context (archaeology)
keywords[6].id	https://openalex.org/keywords/artificial-intelligence
keywords[6].score	0.46738922595977783
keywords[6].display_name	Artificial intelligence
keywords[7].id	https://openalex.org/keywords/error-driven-learning
keywords[7].score	0.4230459928512573
keywords[7].display_name	Error-driven learning
keywords[8].id	https://openalex.org/keywords/knowledge-management
keywords[8].score	0.4024602174758911
keywords[8].display_name	Knowledge management
keywords[9].id	https://openalex.org/keywords/psychology
keywords[9].score	0.16905486583709717
keywords[9].display_name	Psychology
keywords[10].id	https://openalex.org/keywords/engineering
keywords[10].score	0.11114910244941711
keywords[10].display_name	Engineering
keywords[11].id	https://openalex.org/keywords/pedagogy
keywords[11].score	0.09985002875328064
keywords[11].display_name	Pedagogy
language	en
locations[0].id	pmh:oai:arXiv.org:2312.11768
locations[0].is_oa	True
locations[0].source.id	https://openalex.org/S4306400194
locations[0].source.issn
locations[0].source.type	repository
locations[0].source.is_oa	True
locations[0].source.issn_l
locations[0].source.is_core	False
locations[0].source.is_in_doaj	False
locations[0].source.display_name	arXiv (Cornell University)
locations[0].source.host_organization	https://openalex.org/I205783295
locations[0].source.host_organization_name	Cornell University
locations[0].source.host_organization_lineage	https://openalex.org/I205783295
locations[0].license
locations[0].pdf_url	https://arxiv.org/pdf/2312.11768
locations[0].version	submittedVersion
locations[0].raw_type	text
locations[0].license_id
locations[0].is_accepted	False
locations[0].is_published	False
locations[0].raw_source_name
locations[0].landing_page_url	http://arxiv.org/abs/2312.11768
locations[1].id	doi:10.48550/arxiv.2312.11768
locations[1].is_oa	True
locations[1].source.id	https://openalex.org/S4306400194
locations[1].source.issn
locations[1].source.type	repository
locations[1].source.is_oa	True
locations[1].source.issn_l
locations[1].source.is_core	False
locations[1].source.is_in_doaj	False
locations[1].source.display_name	arXiv (Cornell University)
locations[1].source.host_organization	https://openalex.org/I205783295
locations[1].source.host_organization_name	Cornell University
locations[1].source.host_organization_lineage	https://openalex.org/I205783295
locations[1].license
locations[1].pdf_url
locations[1].version
locations[1].raw_type	article
locations[1].license_id
locations[1].is_accepted	False
locations[1].is_published
locations[1].raw_source_name
locations[1].landing_page_url	https://doi.org/10.48550/arxiv.2312.11768
indexed_in	arxiv, datacite
authorships[0].author.id	https://openalex.org/A5025654535
authorships[0].author.orcid
authorships[0].author.display_name	Rupali Bhati
authorships[0].author_position	first
authorships[0].raw_author_name	Bhati, Rupali
authorships[0].is_corresponding	False
authorships[1].author.id	https://openalex.org/A5084915709
authorships[1].author.orcid
authorships[1].author.display_name	Sai Krishna Gottipati
authorships[1].author_position	middle
authorships[1].raw_author_name	Gottipati, Sai Krishna
authorships[1].is_corresponding	False
authorships[2].author.id	https://openalex.org/A5049138810
authorships[2].author.orcid
authorships[2].author.display_name	Clodéric Mars
authorships[2].author_position	middle
authorships[2].raw_author_name	Mars, Clodéric
authorships[2].is_corresponding	False
authorships[3].author.id	https://openalex.org/A5070914351
authorships[3].author.orcid	https://orcid.org/0000-0001-8946-0211
authorships[3].author.display_name	Matthew E. Taylor
authorships[3].author_position	last
authorships[3].raw_author_name	Taylor, Matthew E.
authorships[3].is_corresponding	False
has_content.pdf	False
has_content.grobid_xml	False
is_paratext	False
open_access.is_oa	True
open_access.oa_url	https://arxiv.org/pdf/2312.11768
open_access.oa_status	green
open_access.any_repository_has_fulltext	False
created_date	2025-10-10T00:00:00
display_name	Curriculum Learning for Cooperation in Multi-Agent Reinforcement Learning
has_fulltext	False
is_retracted	False
updated_date	2025-11-06T06:51:31.235846
primary_topic.id	https://openalex.org/T10646
primary_topic.field.id	https://openalex.org/fields/33
primary_topic.field.display_name	Social Sciences
primary_topic.score	0.9750000238418579
primary_topic.domain.id	https://openalex.org/domains/2
primary_topic.domain.display_name	Social Sciences
primary_topic.subfield.id	https://openalex.org/subfields/3311
primary_topic.subfield.display_name	Safety Research
primary_topic.display_name	Experimental Behavioral Economics Studies
related_works	https://openalex.org/W2371091044, https://openalex.org/W2171010636, https://openalex.org/W87513465, https://openalex.org/W2391666574, https://openalex.org/W2786230833, https://openalex.org/W3203256658, https://openalex.org/W2352650970, https://openalex.org/W1544514152, https://openalex.org/W1493952344, https://openalex.org/W4312372616
cited_by_count	0
locations_count	2
best_oa_location.id	pmh:oai:arXiv.org:2312.11768
best_oa_location.is_oa	True
best_oa_location.source.id	https://openalex.org/S4306400194
best_oa_location.source.issn
best_oa_location.source.type	repository
best_oa_location.source.is_oa	True
best_oa_location.source.issn_l
best_oa_location.source.is_core	False
best_oa_location.source.is_in_doaj	False
best_oa_location.source.display_name	arXiv (Cornell University)
best_oa_location.source.host_organization	https://openalex.org/I205783295
best_oa_location.source.host_organization_name	Cornell University
best_oa_location.source.host_organization_lineage	https://openalex.org/I205783295
best_oa_location.license
best_oa_location.pdf_url	https://arxiv.org/pdf/2312.11768
best_oa_location.version	submittedVersion
best_oa_location.raw_type	text
best_oa_location.license_id
best_oa_location.is_accepted	False
best_oa_location.is_published	False
best_oa_location.raw_source_name
best_oa_location.landing_page_url	http://arxiv.org/abs/2312.11768
primary_location.id	pmh:oai:arXiv.org:2312.11768
primary_location.is_oa	True
primary_location.source.id	https://openalex.org/S4306400194
primary_location.source.issn
primary_location.source.type	repository
primary_location.source.is_oa	True
primary_location.source.issn_l
primary_location.source.is_core	False
primary_location.source.is_in_doaj	False
primary_location.source.display_name	arXiv (Cornell University)
primary_location.source.host_organization	https://openalex.org/I205783295
primary_location.source.host_organization_name	Cornell University
primary_location.source.host_organization_lineage	https://openalex.org/I205783295
primary_location.license
primary_location.pdf_url	https://arxiv.org/pdf/2312.11768
primary_location.version	submittedVersion
primary_location.raw_type	text
primary_location.license_id
primary_location.is_accepted	False
primary_location.is_published	False
primary_location.raw_source_name
primary_location.landing_page_url	http://arxiv.org/abs/2312.11768
publication_date	2023-12-19
publication_year	2023
referenced_works_count	0
abstract_inverted_index.a	18, 41, 45, 48, 59, 66, 130, 135, 154, 181
abstract_inverted_index.In	44, 114
abstract_inverted_index.be	38, 52, 72, 138
abstract_inverted_index.by	54
abstract_inverted_index.if	33
abstract_inverted_index.in	6, 23, 40
abstract_inverted_index.is	31, 158, 161
abstract_inverted_index.it	30, 56
abstract_inverted_index.of	21, 26, 61, 123, 126, 132, 175, 183, 194
abstract_inverted_index.on	148
abstract_inverted_index.to	15, 74, 76, 85, 101, 119, 141
abstract_inverted_index.we	117
abstract_inverted_index.(a)	98
abstract_inverted_index.(b)	108
abstract_inverted_index.(or	105
abstract_inverted_index.Our	146
abstract_inverted_index.act	77
abstract_inverted_index.aim	118
abstract_inverted_index.and	9, 81, 107, 129
abstract_inverted_index.but	169
abstract_inverted_index.can	51
abstract_inverted_index.for	12, 165, 172
abstract_inverted_index.has	2
abstract_inverted_index.how	100
abstract_inverted_index.the	24, 94, 103, 110, 121, 149, 162, 170, 173, 176
abstract_inverted_index.two	144
abstract_inverted_index.who	157
abstract_inverted_index.When	89
abstract_inverted_index.able	73
abstract_inverted_index.also	71
abstract_inverted_index.been	3
abstract_inverted_index.best	163
abstract_inverted_index.game	150
abstract_inverted_index.kind	125
abstract_inverted_index.less	159
abstract_inverted_index.must	97
abstract_inverted_index.show	152
abstract_inverted_index.task	104
abstract_inverted_index.team	112, 167
abstract_inverted_index.than	191
abstract_inverted_index.that	153
abstract_inverted_index.them	84
abstract_inverted_index.this	115
abstract_inverted_index.what	124
abstract_inverted_index.wide	19
abstract_inverted_index.with	58, 83, 91, 140, 185
abstract_inverted_index.While	0
abstract_inverted_index.agent	50, 69, 96, 137
abstract_inverted_index.learn	75, 99
abstract_inverted_index.other	79, 92, 192
abstract_inverted_index.skill	187
abstract_inverted_index.still	37
abstract_inverted_index.there	1
abstract_inverted_index.these	34, 143
abstract_inverted_index.types	193
abstract_inverted_index.valid	39
abstract_inverted_index.worst	171
abstract_inverted_index.would	36
abstract_inverted_index.across	17
abstract_inverted_index.agent.	177
abstract_inverted_index.agents	14, 80
abstract_inverted_index.answer	120
abstract_inverted_index.around	78
abstract_inverted_index.better	190
abstract_inverted_index.common	87
abstract_inverted_index.goals.	88
abstract_inverted_index.levels	188
abstract_inverted_index.making	55
abstract_inverted_index.paper,	116
abstract_inverted_index.reward	168
abstract_inverted_index.should	70, 134
abstract_inverted_index.achieve	86, 142
abstract_inverted_index.agents,	93
abstract_inverted_index.compete	57
abstract_inverted_index.context	25
abstract_inverted_index.general	67
abstract_inverted_index.overall	111, 166
abstract_inverted_index.perform	102
abstract_inverted_index.results	147
abstract_inverted_index.reward.	113
abstract_inverted_index.skilled	63, 160
abstract_inverted_index.trained	53, 139
abstract_inverted_index.unclear	32
abstract_inverted_index.variety	20
abstract_inverted_index.However,	65
abstract_inverted_index.increase	109
abstract_inverted_index.learning	8, 11, 49, 95, 136, 174
abstract_inverted_index.performs	189
abstract_inverted_index.progress	5
abstract_inverted_index.question	122
abstract_inverted_index.setting,	47
abstract_inverted_index.setting.	43
abstract_inverted_index.somewhat	179
abstract_inverted_index.teammate	156, 164
abstract_inverted_index.training	13
abstract_inverted_index.Moreover,	178
abstract_inverted_index.cooperate	82
abstract_inverted_index.learning,	29
abstract_inverted_index.subtask),	106
abstract_inverted_index.teammate,	128
abstract_inverted_index.teammates	133, 184
abstract_inverted_index.Overcooked	151
abstract_inverted_index.algorithms	35
abstract_inverted_index.continuous	10
abstract_inverted_index.curricula.	195
abstract_inverted_index.curriculum	7, 60, 131, 182
abstract_inverted_index.decreasing	186
abstract_inverted_index.generalize	16
abstract_inverted_index.opponents.	64
abstract_inverted_index.competitive	46
abstract_inverted_index.cooperating	90
abstract_inverted_index.cooperative	127
abstract_inverted_index.intelligent	68
abstract_inverted_index.multi-agent	42
abstract_inverted_index.objectives.	145
abstract_inverted_index.pre-trained	155
abstract_inverted_index.significant	4
abstract_inverted_index.environments	22
abstract_inverted_index.increasingly	62
abstract_inverted_index.single-agent	27
abstract_inverted_index.reinforcement	28
abstract_inverted_index.surprisingly,	180
cited_by_percentile_year
countries_distinct_count	0
institutions_distinct_count	4
citation_normalized_percentile