B2MAPO: A Batch-by-Batch Multi-Agent Policy Optimization to Balance Performance and Efficiency Article Swipe
YOU?
·
· 2024
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2407.15077
Most multi-agent reinforcement learning approaches adopt two types of policy optimization methods that either update policy simultaneously or sequentially. Simultaneously updating policies of all agents introduces non-stationarity problem. Although sequentially updating policies agent-by-agent in an appropriate order improves policy performance, it is prone to low efficiency due to sequential execution, resulting in longer model training and execution time. Intuitively, partitioning policies of all agents according to their interdependence and updating joint policy batch-by-batch can effectively balance performance and efficiency. However, how to determine the optimal batch partition of policies and batch updating order are challenging problems. Firstly, a sequential batched policy updating scheme, B2MAPO (Batch by Batch Multi-Agent Policy Optimization), is proposed with a theoretical guarantee of the monotonic incrementally tightened bound. Secondly, a universal modulized plug-and-play B2MAPO hierarchical framework, which satisfies CTDE principle, is designed to conveniently integrate any MARL models to fully exploit and merge their merits, including policy optimality and inference efficiency. Next, a DAG-based B2MAPO algorithm is devised, which is a carefully designed implementation of B2MAPO framework. Comprehensive experimental results conducted on StarCraftII Multi-agent Challenge and Google Football Research demonstrate the performance of DAG-based B2MAPO algorithm outperforms baseline methods. Meanwhile, compared with A2PO, our algorithm reduces the model training and execution time by 60.4% and 78.7%, respectively.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2407.15077
- https://arxiv.org/pdf/2407.15077
- OA Status
- green
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4401200984
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4401200984Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2407.15077Digital Object Identifier
- Title
-
B2MAPO: A Batch-by-Batch Multi-Agent Policy Optimization to Balance Performance and EfficiencyWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2024Year of publication
- Publication date
-
2024-07-21Full publication date if available
- Authors
-
Wenjing Zhang, Wei Zhang, Wenqing Hu, Yifan WangList of authors in order
- Landing page
-
https://arxiv.org/abs/2407.15077Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2407.15077Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2407.15077Direct OA link when available
- Concepts
-
Balance (ability), Batch processing, Computer science, Economics, Psychology, Operating system, NeuroscienceTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4401200984 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2407.15077 |
| ids.doi | https://doi.org/10.48550/arxiv.2407.15077 |
| ids.openalex | https://openalex.org/W4401200984 |
| fwci | |
| type | preprint |
| title | B2MAPO: A Batch-by-Batch Multi-Agent Policy Optimization to Balance Performance and Efficiency |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T11182 |
| topics[0].field.id | https://openalex.org/fields/18 |
| topics[0].field.display_name | Decision Sciences |
| topics[0].score | 0.8562999963760376 |
| topics[0].domain.id | https://openalex.org/domains/2 |
| topics[0].domain.display_name | Social Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1803 |
| topics[0].subfield.display_name | Management Science and Operations Research |
| topics[0].display_name | Auction Theory and Applications |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C168031717 |
| concepts[0].level | 2 |
| concepts[0].score | 0.6582053303718567 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q1530280 |
| concepts[0].display_name | Balance (ability) |
| concepts[1].id | https://openalex.org/C172658912 |
| concepts[1].level | 2 |
| concepts[1].score | 0.4578072726726532 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q661613 |
| concepts[1].display_name | Batch processing |
| concepts[2].id | https://openalex.org/C41008148 |
| concepts[2].level | 0 |
| concepts[2].score | 0.41064712405204773 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[2].display_name | Computer science |
| concepts[3].id | https://openalex.org/C162324750 |
| concepts[3].level | 0 |
| concepts[3].score | 0.32853513956069946 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q8134 |
| concepts[3].display_name | Economics |
| concepts[4].id | https://openalex.org/C15744967 |
| concepts[4].level | 0 |
| concepts[4].score | 0.10647252202033997 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q9418 |
| concepts[4].display_name | Psychology |
| concepts[5].id | https://openalex.org/C111919701 |
| concepts[5].level | 1 |
| concepts[5].score | 0.08101996779441833 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q9135 |
| concepts[5].display_name | Operating system |
| concepts[6].id | https://openalex.org/C169760540 |
| concepts[6].level | 1 |
| concepts[6].score | 0.0 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q207011 |
| concepts[6].display_name | Neuroscience |
| keywords[0].id | https://openalex.org/keywords/balance |
| keywords[0].score | 0.6582053303718567 |
| keywords[0].display_name | Balance (ability) |
| keywords[1].id | https://openalex.org/keywords/batch-processing |
| keywords[1].score | 0.4578072726726532 |
| keywords[1].display_name | Batch processing |
| keywords[2].id | https://openalex.org/keywords/computer-science |
| keywords[2].score | 0.41064712405204773 |
| keywords[2].display_name | Computer science |
| keywords[3].id | https://openalex.org/keywords/economics |
| keywords[3].score | 0.32853513956069946 |
| keywords[3].display_name | Economics |
| keywords[4].id | https://openalex.org/keywords/psychology |
| keywords[4].score | 0.10647252202033997 |
| keywords[4].display_name | Psychology |
| keywords[5].id | https://openalex.org/keywords/operating-system |
| keywords[5].score | 0.08101996779441833 |
| keywords[5].display_name | Operating system |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2407.15077 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2407.15077 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2407.15077 |
| locations[1].id | doi:10.48550/arxiv.2407.15077 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2407.15077 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5100407210 |
| authorships[0].author.orcid | https://orcid.org/0000-0002-6694-6072 |
| authorships[0].author.display_name | Wenjing Zhang |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Zhang, Wenjing |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5100682082 |
| authorships[1].author.orcid | https://orcid.org/0000-0003-2017-5564 |
| authorships[1].author.display_name | Wei Zhang |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Zhang, Wei |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5106066810 |
| authorships[2].author.orcid | |
| authorships[2].author.display_name | Wenqing Hu |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Hu, Wenqing |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5107959293 |
| authorships[3].author.orcid | https://orcid.org/0009-0005-9457-7592 |
| authorships[3].author.display_name | Yifan Wang |
| authorships[3].author_position | last |
| authorships[3].raw_author_name | Wang, Yifan |
| authorships[3].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2407.15077 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2024-08-01T00:00:00 |
| display_name | B2MAPO: A Batch-by-Batch Multi-Agent Policy Optimization to Balance Performance and Efficiency |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T11182 |
| primary_topic.field.id | https://openalex.org/fields/18 |
| primary_topic.field.display_name | Decision Sciences |
| primary_topic.score | 0.8562999963760376 |
| primary_topic.domain.id | https://openalex.org/domains/2 |
| primary_topic.domain.display_name | Social Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1803 |
| primary_topic.subfield.display_name | Management Science and Operations Research |
| primary_topic.display_name | Auction Theory and Applications |
| related_works | https://openalex.org/W2748952813, https://openalex.org/W4391375266, https://openalex.org/W2390279801, https://openalex.org/W2358668433, https://openalex.org/W4396701345, https://openalex.org/W2376932109, https://openalex.org/W2001405890, https://openalex.org/W4396696052, https://openalex.org/W2382290278, https://openalex.org/W4395014643 |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2407.15077 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2407.15077 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2407.15077 |
| primary_location.id | pmh:oai:arXiv.org:2407.15077 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2407.15077 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2407.15077 |
| publication_date | 2024-07-21 |
| publication_year | 2024 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 97, 113, 123, 156, 164 |
| abstract_inverted_index.an | 34 |
| abstract_inverted_index.by | 105, 206 |
| abstract_inverted_index.in | 33, 51 |
| abstract_inverted_index.is | 41, 110, 134, 160, 163 |
| abstract_inverted_index.it | 40 |
| abstract_inverted_index.of | 8, 22, 61, 87, 116, 168, 186 |
| abstract_inverted_index.on | 175 |
| abstract_inverted_index.or | 17 |
| abstract_inverted_index.to | 43, 47, 65, 81, 136, 142 |
| abstract_inverted_index.all | 23, 62 |
| abstract_inverted_index.and | 55, 68, 77, 89, 145, 152, 179, 203, 208 |
| abstract_inverted_index.any | 139 |
| abstract_inverted_index.are | 93 |
| abstract_inverted_index.can | 73 |
| abstract_inverted_index.due | 46 |
| abstract_inverted_index.how | 80 |
| abstract_inverted_index.low | 44 |
| abstract_inverted_index.our | 197 |
| abstract_inverted_index.the | 83, 117, 184, 200 |
| abstract_inverted_index.two | 6 |
| abstract_inverted_index.CTDE | 132 |
| abstract_inverted_index.MARL | 140 |
| abstract_inverted_index.Most | 0 |
| abstract_inverted_index.that | 12 |
| abstract_inverted_index.time | 205 |
| abstract_inverted_index.with | 112, 195 |
| abstract_inverted_index.60.4% | 207 |
| abstract_inverted_index.A2PO, | 196 |
| abstract_inverted_index.Batch | 106 |
| abstract_inverted_index.Next, | 155 |
| abstract_inverted_index.adopt | 5 |
| abstract_inverted_index.batch | 85, 90 |
| abstract_inverted_index.fully | 143 |
| abstract_inverted_index.joint | 70 |
| abstract_inverted_index.merge | 146 |
| abstract_inverted_index.model | 53, 201 |
| abstract_inverted_index.order | 36, 92 |
| abstract_inverted_index.prone | 42 |
| abstract_inverted_index.their | 66, 147 |
| abstract_inverted_index.time. | 57 |
| abstract_inverted_index.types | 7 |
| abstract_inverted_index.which | 130, 162 |
| abstract_inverted_index.(Batch | 104 |
| abstract_inverted_index.78.7%, | 209 |
| abstract_inverted_index.B2MAPO | 103, 127, 158, 169, 188 |
| abstract_inverted_index.Google | 180 |
| abstract_inverted_index.Policy | 108 |
| abstract_inverted_index.agents | 24, 63 |
| abstract_inverted_index.bound. | 121 |
| abstract_inverted_index.either | 13 |
| abstract_inverted_index.longer | 52 |
| abstract_inverted_index.models | 141 |
| abstract_inverted_index.policy | 9, 15, 38, 71, 100, 150 |
| abstract_inverted_index.update | 14 |
| abstract_inverted_index.balance | 75 |
| abstract_inverted_index.batched | 99 |
| abstract_inverted_index.exploit | 144 |
| abstract_inverted_index.merits, | 148 |
| abstract_inverted_index.methods | 11 |
| abstract_inverted_index.optimal | 84 |
| abstract_inverted_index.reduces | 199 |
| abstract_inverted_index.results | 173 |
| abstract_inverted_index.scheme, | 102 |
| abstract_inverted_index.Although | 28 |
| abstract_inverted_index.Firstly, | 96 |
| abstract_inverted_index.Football | 181 |
| abstract_inverted_index.However, | 79 |
| abstract_inverted_index.Research | 182 |
| abstract_inverted_index.baseline | 191 |
| abstract_inverted_index.compared | 194 |
| abstract_inverted_index.designed | 135, 166 |
| abstract_inverted_index.devised, | 161 |
| abstract_inverted_index.improves | 37 |
| abstract_inverted_index.learning | 3 |
| abstract_inverted_index.methods. | 192 |
| abstract_inverted_index.policies | 21, 31, 60, 88 |
| abstract_inverted_index.problem. | 27 |
| abstract_inverted_index.proposed | 111 |
| abstract_inverted_index.training | 54, 202 |
| abstract_inverted_index.updating | 20, 30, 69, 91, 101 |
| abstract_inverted_index.Challenge | 178 |
| abstract_inverted_index.DAG-based | 157, 187 |
| abstract_inverted_index.Secondly, | 122 |
| abstract_inverted_index.according | 64 |
| abstract_inverted_index.algorithm | 159, 189, 198 |
| abstract_inverted_index.carefully | 165 |
| abstract_inverted_index.conducted | 174 |
| abstract_inverted_index.determine | 82 |
| abstract_inverted_index.execution | 56, 204 |
| abstract_inverted_index.guarantee | 115 |
| abstract_inverted_index.including | 149 |
| abstract_inverted_index.inference | 153 |
| abstract_inverted_index.integrate | 138 |
| abstract_inverted_index.modulized | 125 |
| abstract_inverted_index.monotonic | 118 |
| abstract_inverted_index.partition | 86 |
| abstract_inverted_index.problems. | 95 |
| abstract_inverted_index.resulting | 50 |
| abstract_inverted_index.satisfies | 131 |
| abstract_inverted_index.tightened | 120 |
| abstract_inverted_index.universal | 124 |
| abstract_inverted_index.Meanwhile, | 193 |
| abstract_inverted_index.approaches | 4 |
| abstract_inverted_index.efficiency | 45 |
| abstract_inverted_index.execution, | 49 |
| abstract_inverted_index.framework, | 129 |
| abstract_inverted_index.framework. | 170 |
| abstract_inverted_index.introduces | 25 |
| abstract_inverted_index.optimality | 151 |
| abstract_inverted_index.principle, | 133 |
| abstract_inverted_index.sequential | 48, 98 |
| abstract_inverted_index.Multi-Agent | 107 |
| abstract_inverted_index.Multi-agent | 177 |
| abstract_inverted_index.StarCraftII | 176 |
| abstract_inverted_index.appropriate | 35 |
| abstract_inverted_index.challenging | 94 |
| abstract_inverted_index.demonstrate | 183 |
| abstract_inverted_index.effectively | 74 |
| abstract_inverted_index.efficiency. | 78, 154 |
| abstract_inverted_index.multi-agent | 1 |
| abstract_inverted_index.outperforms | 190 |
| abstract_inverted_index.performance | 76, 185 |
| abstract_inverted_index.theoretical | 114 |
| abstract_inverted_index.Intuitively, | 58 |
| abstract_inverted_index.conveniently | 137 |
| abstract_inverted_index.experimental | 172 |
| abstract_inverted_index.hierarchical | 128 |
| abstract_inverted_index.optimization | 10 |
| abstract_inverted_index.partitioning | 59 |
| abstract_inverted_index.performance, | 39 |
| abstract_inverted_index.sequentially | 29 |
| abstract_inverted_index.Comprehensive | 171 |
| abstract_inverted_index.incrementally | 119 |
| abstract_inverted_index.plug-and-play | 126 |
| abstract_inverted_index.reinforcement | 2 |
| abstract_inverted_index.respectively. | 210 |
| abstract_inverted_index.sequentially. | 18 |
| abstract_inverted_index.Optimization), | 109 |
| abstract_inverted_index.Simultaneously | 19 |
| abstract_inverted_index.agent-by-agent | 32 |
| abstract_inverted_index.batch-by-batch | 72 |
| abstract_inverted_index.implementation | 167 |
| abstract_inverted_index.simultaneously | 16 |
| abstract_inverted_index.interdependence | 67 |
| abstract_inverted_index.non-stationarity | 26 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 4 |
| citation_normalized_percentile |