GAWM: Global-Aware World Model for Multi-Agent Reinforcement Learning Article Swipe
YOU?
·
· 2025
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2501.10116
In recent years, Model-based Multi-Agent Reinforcement Learning (MARL) has demonstrated significant advantages over model-free methods in terms of sample efficiency by using independent environment dynamics world models for data sample augmentation. However, without considering the limited sample size, these methods still lag behind model-free methods in terms of final convergence performance and stability. This is primarily due to the world model's insufficient and unstable representation of global states in partially observable environments. This limitation hampers the ability to ensure global consistency in the data samples and results in a time-varying and unstable distribution mismatch between the pseudo data samples generated by the world model and the real samples. This issue becomes particularly pronounced in more complex multi-agent environments. To address this challenge, we propose a model-based MARL method called GAWM, which enhances the centralized world model's ability to achieve globally unified and accurate representation of state information while adhering to the CTDE paradigm. GAWM uniquely leverages an additional Transformer architecture to fuse local observation information from different agents, thereby improving its ability to extract and represent global state information. This enhancement not only improves sample efficiency but also enhances training stability, leading to superior convergence performance, particularly in complex and challenging multi-agent environments. This advancement enables model-based methods to be effectively applied to more complex multi-agent environments. Experimental results demonstrate that GAWM outperforms various model-free and model-based approaches, achieving exceptional performance in the challenging domains of SMAC.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2501.10116
- https://arxiv.org/pdf/2501.10116
- OA Status
- green
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4406603986
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4406603986Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2501.10116Digital Object Identifier
- Title
-
GAWM: Global-Aware World Model for Multi-Agent Reinforcement LearningWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2025Year of publication
- Publication date
-
2025-01-17Full publication date if available
- Authors
-
Zifeng Shi, Meiqin Liu, Senlin Zhang, Ronghao Zheng, Shanling Dong, Ping WeiList of authors in order
- Landing page
-
https://arxiv.org/abs/2501.10116Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2501.10116Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2501.10116Direct OA link when available
- Concepts
-
Reinforcement learning, Computer science, Reinforcement, Artificial intelligence, Psychology, Social psychologyTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4406603986 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2501.10116 |
| ids.doi | https://doi.org/10.48550/arxiv.2501.10116 |
| ids.openalex | https://openalex.org/W4406603986 |
| fwci | |
| type | preprint |
| title | GAWM: Global-Aware World Model for Multi-Agent Reinforcement Learning |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10462 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.7109000086784363 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1702 |
| topics[0].subfield.display_name | Artificial Intelligence |
| topics[0].display_name | Reinforcement Learning in Robotics |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C97541855 |
| concepts[0].level | 2 |
| concepts[0].score | 0.826128363609314 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q830687 |
| concepts[0].display_name | Reinforcement learning |
| concepts[1].id | https://openalex.org/C41008148 |
| concepts[1].level | 0 |
| concepts[1].score | 0.5357286930084229 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[1].display_name | Computer science |
| concepts[2].id | https://openalex.org/C67203356 |
| concepts[2].level | 2 |
| concepts[2].score | 0.4641820788383484 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q1321905 |
| concepts[2].display_name | Reinforcement |
| concepts[3].id | https://openalex.org/C154945302 |
| concepts[3].level | 1 |
| concepts[3].score | 0.32758933305740356 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[3].display_name | Artificial intelligence |
| concepts[4].id | https://openalex.org/C15744967 |
| concepts[4].level | 0 |
| concepts[4].score | 0.15009543299674988 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q9418 |
| concepts[4].display_name | Psychology |
| concepts[5].id | https://openalex.org/C77805123 |
| concepts[5].level | 1 |
| concepts[5].score | 0.052991628646850586 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q161272 |
| concepts[5].display_name | Social psychology |
| keywords[0].id | https://openalex.org/keywords/reinforcement-learning |
| keywords[0].score | 0.826128363609314 |
| keywords[0].display_name | Reinforcement learning |
| keywords[1].id | https://openalex.org/keywords/computer-science |
| keywords[1].score | 0.5357286930084229 |
| keywords[1].display_name | Computer science |
| keywords[2].id | https://openalex.org/keywords/reinforcement |
| keywords[2].score | 0.4641820788383484 |
| keywords[2].display_name | Reinforcement |
| keywords[3].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[3].score | 0.32758933305740356 |
| keywords[3].display_name | Artificial intelligence |
| keywords[4].id | https://openalex.org/keywords/psychology |
| keywords[4].score | 0.15009543299674988 |
| keywords[4].display_name | Psychology |
| keywords[5].id | https://openalex.org/keywords/social-psychology |
| keywords[5].score | 0.052991628646850586 |
| keywords[5].display_name | Social psychology |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2501.10116 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2501.10116 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2501.10116 |
| locations[1].id | doi:10.48550/arxiv.2501.10116 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2501.10116 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5062190927 |
| authorships[0].author.orcid | |
| authorships[0].author.display_name | Zifeng Shi |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Shi, Zifeng |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5033808402 |
| authorships[1].author.orcid | https://orcid.org/0000-0003-0693-6574 |
| authorships[1].author.display_name | Meiqin Liu |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Liu, Meiqin |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5003643230 |
| authorships[2].author.orcid | https://orcid.org/0000-0001-5117-3110 |
| authorships[2].author.display_name | Senlin Zhang |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Zhang, Senlin |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5048107920 |
| authorships[3].author.orcid | https://orcid.org/0000-0002-9095-5905 |
| authorships[3].author.display_name | Ronghao Zheng |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Zheng, Ronghao |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5004998505 |
| authorships[4].author.orcid | https://orcid.org/0000-0002-1754-1829 |
| authorships[4].author.display_name | Shanling Dong |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Dong, Shanling |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A5101947241 |
| authorships[5].author.orcid | https://orcid.org/0000-0002-8535-9527 |
| authorships[5].author.display_name | Ping Wei |
| authorships[5].author_position | last |
| authorships[5].raw_author_name | Wei, Ping |
| authorships[5].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2501.10116 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | GAWM: Global-Aware World Model for Multi-Agent Reinforcement Learning |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10462 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.7109000086784363 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1702 |
| primary_topic.subfield.display_name | Artificial Intelligence |
| primary_topic.display_name | Reinforcement Learning in Robotics |
| related_works | https://openalex.org/W4391375266, https://openalex.org/W2899084033, https://openalex.org/W2748952813, https://openalex.org/W4310083477, https://openalex.org/W2328553770, https://openalex.org/W2920061524, https://openalex.org/W1977959518, https://openalex.org/W2038908348, https://openalex.org/W2107890255, https://openalex.org/W2106552856 |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2501.10116 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2501.10116 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2501.10116 |
| primary_location.id | pmh:oai:arXiv.org:2501.10116 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2501.10116 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2501.10116 |
| publication_date | 2025-01-17 |
| publication_year | 2025 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 88, 124 |
| abstract_inverted_index.In | 0 |
| abstract_inverted_index.To | 118 |
| abstract_inverted_index.an | 156 |
| abstract_inverted_index.be | 209 |
| abstract_inverted_index.by | 20, 100 |
| abstract_inverted_index.in | 15, 45, 68, 81, 87, 113, 197, 231 |
| abstract_inverted_index.is | 54 |
| abstract_inverted_index.of | 17, 47, 65, 144, 235 |
| abstract_inverted_index.to | 57, 77, 137, 149, 160, 172, 192, 208, 212 |
| abstract_inverted_index.we | 122 |
| abstract_inverted_index.and | 51, 62, 85, 90, 104, 141, 174, 199, 225 |
| abstract_inverted_index.but | 186 |
| abstract_inverted_index.due | 56 |
| abstract_inverted_index.for | 27 |
| abstract_inverted_index.has | 8 |
| abstract_inverted_index.its | 170 |
| abstract_inverted_index.lag | 41 |
| abstract_inverted_index.not | 181 |
| abstract_inverted_index.the | 34, 58, 75, 82, 95, 101, 105, 132, 150, 232 |
| abstract_inverted_index.CTDE | 151 |
| abstract_inverted_index.GAWM | 153, 221 |
| abstract_inverted_index.MARL | 126 |
| abstract_inverted_index.This | 53, 72, 108, 179, 203 |
| abstract_inverted_index.also | 187 |
| abstract_inverted_index.data | 28, 83, 97 |
| abstract_inverted_index.from | 165 |
| abstract_inverted_index.fuse | 161 |
| abstract_inverted_index.more | 114, 213 |
| abstract_inverted_index.only | 182 |
| abstract_inverted_index.over | 12 |
| abstract_inverted_index.real | 106 |
| abstract_inverted_index.that | 220 |
| abstract_inverted_index.this | 120 |
| abstract_inverted_index.GAWM, | 129 |
| abstract_inverted_index.SMAC. | 236 |
| abstract_inverted_index.final | 48 |
| abstract_inverted_index.issue | 109 |
| abstract_inverted_index.local | 162 |
| abstract_inverted_index.model | 103 |
| abstract_inverted_index.size, | 37 |
| abstract_inverted_index.state | 145, 177 |
| abstract_inverted_index.still | 40 |
| abstract_inverted_index.terms | 16, 46 |
| abstract_inverted_index.these | 38 |
| abstract_inverted_index.using | 21 |
| abstract_inverted_index.which | 130 |
| abstract_inverted_index.while | 147 |
| abstract_inverted_index.world | 25, 59, 102, 134 |
| abstract_inverted_index.(MARL) | 7 |
| abstract_inverted_index.behind | 42 |
| abstract_inverted_index.called | 128 |
| abstract_inverted_index.ensure | 78 |
| abstract_inverted_index.global | 66, 79, 176 |
| abstract_inverted_index.method | 127 |
| abstract_inverted_index.models | 26 |
| abstract_inverted_index.pseudo | 96 |
| abstract_inverted_index.recent | 1 |
| abstract_inverted_index.sample | 18, 29, 36, 184 |
| abstract_inverted_index.states | 67 |
| abstract_inverted_index.years, | 2 |
| abstract_inverted_index.ability | 76, 136, 171 |
| abstract_inverted_index.achieve | 138 |
| abstract_inverted_index.address | 119 |
| abstract_inverted_index.agents, | 167 |
| abstract_inverted_index.applied | 211 |
| abstract_inverted_index.becomes | 110 |
| abstract_inverted_index.between | 94 |
| abstract_inverted_index.complex | 115, 198, 214 |
| abstract_inverted_index.domains | 234 |
| abstract_inverted_index.enables | 205 |
| abstract_inverted_index.extract | 173 |
| abstract_inverted_index.hampers | 74 |
| abstract_inverted_index.leading | 191 |
| abstract_inverted_index.limited | 35 |
| abstract_inverted_index.methods | 14, 39, 44, 207 |
| abstract_inverted_index.model's | 60, 135 |
| abstract_inverted_index.propose | 123 |
| abstract_inverted_index.results | 86, 218 |
| abstract_inverted_index.samples | 84, 98 |
| abstract_inverted_index.thereby | 168 |
| abstract_inverted_index.unified | 140 |
| abstract_inverted_index.various | 223 |
| abstract_inverted_index.without | 32 |
| abstract_inverted_index.However, | 31 |
| abstract_inverted_index.Learning | 6 |
| abstract_inverted_index.accurate | 142 |
| abstract_inverted_index.adhering | 148 |
| abstract_inverted_index.dynamics | 24 |
| abstract_inverted_index.enhances | 131, 188 |
| abstract_inverted_index.globally | 139 |
| abstract_inverted_index.improves | 183 |
| abstract_inverted_index.mismatch | 93 |
| abstract_inverted_index.samples. | 107 |
| abstract_inverted_index.superior | 193 |
| abstract_inverted_index.training | 189 |
| abstract_inverted_index.uniquely | 154 |
| abstract_inverted_index.unstable | 63, 91 |
| abstract_inverted_index.achieving | 228 |
| abstract_inverted_index.different | 166 |
| abstract_inverted_index.generated | 99 |
| abstract_inverted_index.improving | 169 |
| abstract_inverted_index.leverages | 155 |
| abstract_inverted_index.paradigm. | 152 |
| abstract_inverted_index.partially | 69 |
| abstract_inverted_index.primarily | 55 |
| abstract_inverted_index.represent | 175 |
| abstract_inverted_index.additional | 157 |
| abstract_inverted_index.advantages | 11 |
| abstract_inverted_index.challenge, | 121 |
| abstract_inverted_index.efficiency | 19, 185 |
| abstract_inverted_index.limitation | 73 |
| abstract_inverted_index.model-free | 13, 43, 224 |
| abstract_inverted_index.observable | 70 |
| abstract_inverted_index.pronounced | 112 |
| abstract_inverted_index.stability, | 190 |
| abstract_inverted_index.stability. | 52 |
| abstract_inverted_index.Model-based | 3 |
| abstract_inverted_index.Multi-Agent | 4 |
| abstract_inverted_index.Transformer | 158 |
| abstract_inverted_index.advancement | 204 |
| abstract_inverted_index.approaches, | 227 |
| abstract_inverted_index.centralized | 133 |
| abstract_inverted_index.challenging | 200, 233 |
| abstract_inverted_index.considering | 33 |
| abstract_inverted_index.consistency | 80 |
| abstract_inverted_index.convergence | 49, 194 |
| abstract_inverted_index.demonstrate | 219 |
| abstract_inverted_index.effectively | 210 |
| abstract_inverted_index.enhancement | 180 |
| abstract_inverted_index.environment | 23 |
| abstract_inverted_index.exceptional | 229 |
| abstract_inverted_index.independent | 22 |
| abstract_inverted_index.information | 146, 164 |
| abstract_inverted_index.model-based | 125, 206, 226 |
| abstract_inverted_index.multi-agent | 116, 201, 215 |
| abstract_inverted_index.observation | 163 |
| abstract_inverted_index.outperforms | 222 |
| abstract_inverted_index.performance | 50, 230 |
| abstract_inverted_index.significant | 10 |
| abstract_inverted_index.Experimental | 217 |
| abstract_inverted_index.architecture | 159 |
| abstract_inverted_index.demonstrated | 9 |
| abstract_inverted_index.distribution | 92 |
| abstract_inverted_index.information. | 178 |
| abstract_inverted_index.insufficient | 61 |
| abstract_inverted_index.particularly | 111, 196 |
| abstract_inverted_index.performance, | 195 |
| abstract_inverted_index.time-varying | 89 |
| abstract_inverted_index.Reinforcement | 5 |
| abstract_inverted_index.augmentation. | 30 |
| abstract_inverted_index.environments. | 71, 117, 202, 216 |
| abstract_inverted_index.representation | 64, 143 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 6 |
| citation_normalized_percentile |