GAWM: Global-Aware World Model for Multi-Agent Reinforcement Learning Article Swipe

PDF

Zifeng Shi , Meiqin Liu , Senlin Zhang , Ronghao Zheng , Shanling Dong , Ping Wei ·

YOU? · · 2025 · Open Access · · DOI: https://doi.org/10.48550/arxiv.2501.10116

In recent years, Model-based Multi-Agent Reinforcement Learning (MARL) has demonstrated significant advantages over model-free methods in terms of sample efficiency by using independent environment dynamics world models for data sample augmentation. However, without considering the limited sample size, these methods still lag behind model-free methods in terms of final convergence performance and stability. This is primarily due to the world model's insufficient and unstable representation of global states in partially observable environments. This limitation hampers the ability to ensure global consistency in the data samples and results in a time-varying and unstable distribution mismatch between the pseudo data samples generated by the world model and the real samples. This issue becomes particularly pronounced in more complex multi-agent environments. To address this challenge, we propose a model-based MARL method called GAWM, which enhances the centralized world model's ability to achieve globally unified and accurate representation of state information while adhering to the CTDE paradigm. GAWM uniquely leverages an additional Transformer architecture to fuse local observation information from different agents, thereby improving its ability to extract and represent global state information. This enhancement not only improves sample efficiency but also enhances training stability, leading to superior convergence performance, particularly in complex and challenging multi-agent environments. This advancement enables model-based methods to be effectively applied to more complex multi-agent environments. Experimental results demonstrate that GAWM outperforms various model-free and model-based approaches, achieving exceptional performance in the challenging domains of SMAC.

Related Topics

Reinforcement Learning

Computer Science

Artificial Intelligence

Social Psychology

Concepts

Reinforcement learning Computer science Reinforcement Artificial intelligence Psychology Social psychology

Metadata

Type: preprint
Language: en
Landing Page: http://arxiv.org/abs/2501.10116
PDF: https://arxiv.org/pdf/2501.10116
OA Status: green
Related Works: 10
OpenAlex ID: https://openalex.org/W4406603986

All OpenAlex metadata

Raw OpenAlex JSON

OpenAlex ID: https://openalex.org/W4406603986

Canonical identifier for this work in OpenAlex
DOI: https://doi.org/10.48550/arxiv.2501.10116

Digital Object Identifier
Title: GAWM: Global-Aware World Model for Multi-Agent Reinforcement Learning

Work title
Type: preprint

OpenAlex work type
Language: en

Primary language
Publication year: 2025

Year of publication
Publication date: 2025-01-17

Full publication date if available
Authors: Zifeng Shi, Meiqin Liu, Senlin Zhang, Ronghao Zheng, Shanling Dong, Ping Wei

List of authors in order
Landing page: https://arxiv.org/abs/2501.10116

Publisher landing page
PDF URL: https://arxiv.org/pdf/2501.10116

Direct link to full text PDF
Open access: Yes

Whether a free full text is available
OA status: green

Open access status per OpenAlex
OA URL: https://arxiv.org/pdf/2501.10116

Direct OA link when available
Concepts: Reinforcement learning, Computer science, Reinforcement, Artificial intelligence, Psychology, Social psychology

Top concepts (fields/topics) attached by OpenAlex
Cited by: 0

Total citation count in OpenAlex
Related works (count): 10

Other works algorithmically related by OpenAlex

Full payload

id	https://openalex.org/W4406603986
doi	https://doi.org/10.48550/arxiv.2501.10116
ids.doi	https://doi.org/10.48550/arxiv.2501.10116
ids.openalex	https://openalex.org/W4406603986
fwci
type	preprint
title	GAWM: Global-Aware World Model for Multi-Agent Reinforcement Learning
biblio.issue
biblio.volume
biblio.last_page
biblio.first_page
topics[0].id	https://openalex.org/T10462
topics[0].field.id	https://openalex.org/fields/17
topics[0].field.display_name	Computer Science
topics[0].score	0.7109000086784363
topics[0].domain.id	https://openalex.org/domains/3
topics[0].domain.display_name	Physical Sciences
topics[0].subfield.id	https://openalex.org/subfields/1702
topics[0].subfield.display_name	Artificial Intelligence
topics[0].display_name	Reinforcement Learning in Robotics
is_xpac	False
apc_list
apc_paid
concepts[0].id	https://openalex.org/C97541855
concepts[0].level	2
concepts[0].score	0.826128363609314
concepts[0].wikidata	https://www.wikidata.org/wiki/Q830687
concepts[0].display_name	Reinforcement learning
concepts[1].id	https://openalex.org/C41008148
concepts[1].level	0
concepts[1].score	0.5357286930084229
concepts[1].wikidata	https://www.wikidata.org/wiki/Q21198
concepts[1].display_name	Computer science
concepts[2].id	https://openalex.org/C67203356
concepts[2].level	2
concepts[2].score	0.4641820788383484
concepts[2].wikidata	https://www.wikidata.org/wiki/Q1321905
concepts[2].display_name	Reinforcement
concepts[3].id	https://openalex.org/C154945302
concepts[3].level	1
concepts[3].score	0.32758933305740356
concepts[3].wikidata	https://www.wikidata.org/wiki/Q11660
concepts[3].display_name	Artificial intelligence
concepts[4].id	https://openalex.org/C15744967
concepts[4].level	0
concepts[4].score	0.15009543299674988
concepts[4].wikidata	https://www.wikidata.org/wiki/Q9418
concepts[4].display_name	Psychology
concepts[5].id	https://openalex.org/C77805123
concepts[5].level	1
concepts[5].score	0.052991628646850586
concepts[5].wikidata	https://www.wikidata.org/wiki/Q161272
concepts[5].display_name	Social psychology
keywords[0].id	https://openalex.org/keywords/reinforcement-learning
keywords[0].score	0.826128363609314
keywords[0].display_name	Reinforcement learning
keywords[1].id	https://openalex.org/keywords/computer-science
keywords[1].score	0.5357286930084229
keywords[1].display_name	Computer science
keywords[2].id	https://openalex.org/keywords/reinforcement
keywords[2].score	0.4641820788383484
keywords[2].display_name	Reinforcement
keywords[3].id	https://openalex.org/keywords/artificial-intelligence
keywords[3].score	0.32758933305740356
keywords[3].display_name	Artificial intelligence
keywords[4].id	https://openalex.org/keywords/psychology
keywords[4].score	0.15009543299674988
keywords[4].display_name	Psychology
keywords[5].id	https://openalex.org/keywords/social-psychology
keywords[5].score	0.052991628646850586
keywords[5].display_name	Social psychology
language	en
locations[0].id	pmh:oai:arXiv.org:2501.10116
locations[0].is_oa	True
locations[0].source.id	https://openalex.org/S4306400194
locations[0].source.issn
locations[0].source.type	repository
locations[0].source.is_oa	True
locations[0].source.issn_l
locations[0].source.is_core	False
locations[0].source.is_in_doaj	False
locations[0].source.display_name	arXiv (Cornell University)
locations[0].source.host_organization	https://openalex.org/I205783295
locations[0].source.host_organization_name	Cornell University
locations[0].source.host_organization_lineage	https://openalex.org/I205783295
locations[0].license
locations[0].pdf_url	https://arxiv.org/pdf/2501.10116
locations[0].version	submittedVersion
locations[0].raw_type	text
locations[0].license_id
locations[0].is_accepted	False
locations[0].is_published	False
locations[0].raw_source_name
locations[0].landing_page_url	http://arxiv.org/abs/2501.10116
locations[1].id	doi:10.48550/arxiv.2501.10116
locations[1].is_oa	True
locations[1].source.id	https://openalex.org/S4306400194
locations[1].source.issn
locations[1].source.type	repository
locations[1].source.is_oa	True
locations[1].source.issn_l
locations[1].source.is_core	False
locations[1].source.is_in_doaj	False
locations[1].source.display_name	arXiv (Cornell University)
locations[1].source.host_organization	https://openalex.org/I205783295
locations[1].source.host_organization_name	Cornell University
locations[1].source.host_organization_lineage	https://openalex.org/I205783295
locations[1].license
locations[1].pdf_url
locations[1].version
locations[1].raw_type	article
locations[1].license_id
locations[1].is_accepted	False
locations[1].is_published
locations[1].raw_source_name
locations[1].landing_page_url	https://doi.org/10.48550/arxiv.2501.10116
indexed_in	arxiv, datacite
authorships[0].author.id	https://openalex.org/A5062190927
authorships[0].author.orcid
authorships[0].author.display_name	Zifeng Shi
authorships[0].author_position	first
authorships[0].raw_author_name	Shi, Zifeng
authorships[0].is_corresponding	False
authorships[1].author.id	https://openalex.org/A5033808402
authorships[1].author.orcid	https://orcid.org/0000-0003-0693-6574
authorships[1].author.display_name	Meiqin Liu
authorships[1].author_position	middle
authorships[1].raw_author_name	Liu, Meiqin
authorships[1].is_corresponding	False
authorships[2].author.id	https://openalex.org/A5003643230
authorships[2].author.orcid	https://orcid.org/0000-0001-5117-3110
authorships[2].author.display_name	Senlin Zhang
authorships[2].author_position	middle
authorships[2].raw_author_name	Zhang, Senlin
authorships[2].is_corresponding	False
authorships[3].author.id	https://openalex.org/A5048107920
authorships[3].author.orcid	https://orcid.org/0000-0002-9095-5905
authorships[3].author.display_name	Ronghao Zheng
authorships[3].author_position	middle
authorships[3].raw_author_name	Zheng, Ronghao
authorships[3].is_corresponding	False
authorships[4].author.id	https://openalex.org/A5004998505
authorships[4].author.orcid	https://orcid.org/0000-0002-1754-1829
authorships[4].author.display_name	Shanling Dong
authorships[4].author_position	middle
authorships[4].raw_author_name	Dong, Shanling
authorships[4].is_corresponding	False
authorships[5].author.id	https://openalex.org/A5101947241
authorships[5].author.orcid	https://orcid.org/0000-0002-8535-9527
authorships[5].author.display_name	Ping Wei
authorships[5].author_position	last
authorships[5].raw_author_name	Wei, Ping
authorships[5].is_corresponding	False
has_content.pdf	False
has_content.grobid_xml	False
is_paratext	False
open_access.is_oa	True
open_access.oa_url	https://arxiv.org/pdf/2501.10116
open_access.oa_status	green
open_access.any_repository_has_fulltext	False
created_date	2025-10-10T00:00:00
display_name	GAWM: Global-Aware World Model for Multi-Agent Reinforcement Learning
has_fulltext	False
is_retracted	False
updated_date	2025-11-06T06:51:31.235846
primary_topic.id	https://openalex.org/T10462
primary_topic.field.id	https://openalex.org/fields/17
primary_topic.field.display_name	Computer Science
primary_topic.score	0.7109000086784363
primary_topic.domain.id	https://openalex.org/domains/3
primary_topic.domain.display_name	Physical Sciences
primary_topic.subfield.id	https://openalex.org/subfields/1702
primary_topic.subfield.display_name	Artificial Intelligence
primary_topic.display_name	Reinforcement Learning in Robotics
related_works	https://openalex.org/W4391375266, https://openalex.org/W2899084033, https://openalex.org/W2748952813, https://openalex.org/W4310083477, https://openalex.org/W2328553770, https://openalex.org/W2920061524, https://openalex.org/W1977959518, https://openalex.org/W2038908348, https://openalex.org/W2107890255, https://openalex.org/W2106552856
cited_by_count	0
locations_count	2
best_oa_location.id	pmh:oai:arXiv.org:2501.10116
best_oa_location.is_oa	True
best_oa_location.source.id	https://openalex.org/S4306400194
best_oa_location.source.issn
best_oa_location.source.type	repository
best_oa_location.source.is_oa	True
best_oa_location.source.issn_l
best_oa_location.source.is_core	False
best_oa_location.source.is_in_doaj	False
best_oa_location.source.display_name	arXiv (Cornell University)
best_oa_location.source.host_organization	https://openalex.org/I205783295
best_oa_location.source.host_organization_name	Cornell University
best_oa_location.source.host_organization_lineage	https://openalex.org/I205783295
best_oa_location.license
best_oa_location.pdf_url	https://arxiv.org/pdf/2501.10116
best_oa_location.version	submittedVersion
best_oa_location.raw_type	text
best_oa_location.license_id
best_oa_location.is_accepted	False
best_oa_location.is_published	False
best_oa_location.raw_source_name
best_oa_location.landing_page_url	http://arxiv.org/abs/2501.10116
primary_location.id	pmh:oai:arXiv.org:2501.10116
primary_location.is_oa	True
primary_location.source.id	https://openalex.org/S4306400194
primary_location.source.issn
primary_location.source.type	repository
primary_location.source.is_oa	True
primary_location.source.issn_l
primary_location.source.is_core	False
primary_location.source.is_in_doaj	False
primary_location.source.display_name	arXiv (Cornell University)
primary_location.source.host_organization	https://openalex.org/I205783295
primary_location.source.host_organization_name	Cornell University
primary_location.source.host_organization_lineage	https://openalex.org/I205783295
primary_location.license
primary_location.pdf_url	https://arxiv.org/pdf/2501.10116
primary_location.version	submittedVersion
primary_location.raw_type	text
primary_location.license_id
primary_location.is_accepted	False
primary_location.is_published	False
primary_location.raw_source_name
primary_location.landing_page_url	http://arxiv.org/abs/2501.10116
publication_date	2025-01-17
publication_year	2025
referenced_works_count	0
abstract_inverted_index.a	88, 124
abstract_inverted_index.In	0
abstract_inverted_index.To	118
abstract_inverted_index.an	156
abstract_inverted_index.be	209
abstract_inverted_index.by	20, 100
abstract_inverted_index.in	15, 45, 68, 81, 87, 113, 197, 231
abstract_inverted_index.is	54
abstract_inverted_index.of	17, 47, 65, 144, 235
abstract_inverted_index.to	57, 77, 137, 149, 160, 172, 192, 208, 212
abstract_inverted_index.we	122
abstract_inverted_index.and	51, 62, 85, 90, 104, 141, 174, 199, 225
abstract_inverted_index.but	186
abstract_inverted_index.due	56
abstract_inverted_index.for	27
abstract_inverted_index.has	8
abstract_inverted_index.its	170
abstract_inverted_index.lag	41
abstract_inverted_index.not	181
abstract_inverted_index.the	34, 58, 75, 82, 95, 101, 105, 132, 150, 232
abstract_inverted_index.CTDE	151
abstract_inverted_index.GAWM	153, 221
abstract_inverted_index.MARL	126
abstract_inverted_index.This	53, 72, 108, 179, 203
abstract_inverted_index.also	187
abstract_inverted_index.data	28, 83, 97
abstract_inverted_index.from	165
abstract_inverted_index.fuse	161
abstract_inverted_index.more	114, 213
abstract_inverted_index.only	182
abstract_inverted_index.over	12
abstract_inverted_index.real	106
abstract_inverted_index.that	220
abstract_inverted_index.this	120
abstract_inverted_index.GAWM,	129
abstract_inverted_index.SMAC.	236
abstract_inverted_index.final	48
abstract_inverted_index.issue	109
abstract_inverted_index.local	162
abstract_inverted_index.model	103
abstract_inverted_index.size,	37
abstract_inverted_index.state	145, 177
abstract_inverted_index.still	40
abstract_inverted_index.terms	16, 46
abstract_inverted_index.these	38
abstract_inverted_index.using	21
abstract_inverted_index.which	130
abstract_inverted_index.while	147
abstract_inverted_index.world	25, 59, 102, 134
abstract_inverted_index.(MARL)	7
abstract_inverted_index.behind	42
abstract_inverted_index.called	128
abstract_inverted_index.ensure	78
abstract_inverted_index.global	66, 79, 176
abstract_inverted_index.method	127
abstract_inverted_index.models	26
abstract_inverted_index.pseudo	96
abstract_inverted_index.recent	1
abstract_inverted_index.sample	18, 29, 36, 184
abstract_inverted_index.states	67
abstract_inverted_index.years,	2
abstract_inverted_index.ability	76, 136, 171
abstract_inverted_index.achieve	138
abstract_inverted_index.address	119
abstract_inverted_index.agents,	167
abstract_inverted_index.applied	211
abstract_inverted_index.becomes	110
abstract_inverted_index.between	94
abstract_inverted_index.complex	115, 198, 214
abstract_inverted_index.domains	234
abstract_inverted_index.enables	205
abstract_inverted_index.extract	173
abstract_inverted_index.hampers	74
abstract_inverted_index.leading	191
abstract_inverted_index.limited	35
abstract_inverted_index.methods	14, 39, 44, 207
abstract_inverted_index.model's	60, 135
abstract_inverted_index.propose	123
abstract_inverted_index.results	86, 218
abstract_inverted_index.samples	84, 98
abstract_inverted_index.thereby	168
abstract_inverted_index.unified	140
abstract_inverted_index.various	223
abstract_inverted_index.without	32
abstract_inverted_index.However,	31
abstract_inverted_index.Learning	6
abstract_inverted_index.accurate	142
abstract_inverted_index.adhering	148
abstract_inverted_index.dynamics	24
abstract_inverted_index.enhances	131, 188
abstract_inverted_index.globally	139
abstract_inverted_index.improves	183
abstract_inverted_index.mismatch	93
abstract_inverted_index.samples.	107
abstract_inverted_index.superior	193
abstract_inverted_index.training	189
abstract_inverted_index.uniquely	154
abstract_inverted_index.unstable	63, 91
abstract_inverted_index.achieving	228
abstract_inverted_index.different	166
abstract_inverted_index.generated	99
abstract_inverted_index.improving	169
abstract_inverted_index.leverages	155
abstract_inverted_index.paradigm.	152
abstract_inverted_index.partially	69
abstract_inverted_index.primarily	55
abstract_inverted_index.represent	175
abstract_inverted_index.additional	157
abstract_inverted_index.advantages	11
abstract_inverted_index.challenge,	121
abstract_inverted_index.efficiency	19, 185
abstract_inverted_index.limitation	73
abstract_inverted_index.model-free	13, 43, 224
abstract_inverted_index.observable	70
abstract_inverted_index.pronounced	112
abstract_inverted_index.stability,	190
abstract_inverted_index.stability.	52
abstract_inverted_index.Model-based	3
abstract_inverted_index.Multi-Agent	4
abstract_inverted_index.Transformer	158
abstract_inverted_index.advancement	204
abstract_inverted_index.approaches,	227
abstract_inverted_index.centralized	133
abstract_inverted_index.challenging	200, 233
abstract_inverted_index.considering	33
abstract_inverted_index.consistency	80
abstract_inverted_index.convergence	49, 194
abstract_inverted_index.demonstrate	219
abstract_inverted_index.effectively	210
abstract_inverted_index.enhancement	180
abstract_inverted_index.environment	23
abstract_inverted_index.exceptional	229
abstract_inverted_index.independent	22
abstract_inverted_index.information	146, 164
abstract_inverted_index.model-based	125, 206, 226
abstract_inverted_index.multi-agent	116, 201, 215
abstract_inverted_index.observation	163
abstract_inverted_index.outperforms	222
abstract_inverted_index.performance	50, 230
abstract_inverted_index.significant	10
abstract_inverted_index.Experimental	217
abstract_inverted_index.architecture	159
abstract_inverted_index.demonstrated	9
abstract_inverted_index.distribution	92
abstract_inverted_index.information.	178
abstract_inverted_index.insufficient	61
abstract_inverted_index.particularly	111, 196
abstract_inverted_index.performance,	195
abstract_inverted_index.time-varying	89
abstract_inverted_index.Reinforcement	5
abstract_inverted_index.augmentation.	30
abstract_inverted_index.environments.	71, 117, 202, 216
abstract_inverted_index.representation	64, 143
cited_by_percentile_year
countries_distinct_count	0
institutions_distinct_count	6
citation_normalized_percentile