Learning Generalizable Skills from Offline Multi-Task Data for Multi-Agent Cooperation Article Swipe
YOU?
·
· 2025
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2503.21200
Learning cooperative multi-agent policy from offline multi-task data that can generalize to unseen tasks with varying numbers of agents and targets is an attractive problem in many scenarios. Although aggregating general behavior patterns among multiple tasks as skills to improve policy transfer is a promising approach, two primary challenges hinder the further advancement of skill learning in offline multi-task MARL. Firstly, extracting general cooperative behaviors from various action sequences as common skills lacks bringing cooperative temporal knowledge into them. Secondly, existing works only involve common skills and can not adaptively choose independent knowledge as task-specific skills in each task for fine-grained action execution. To tackle these challenges, we propose Hierarchical and Separate Skill Discovery (HiSSD), a novel approach for generalizable offline multi-task MARL through skill learning. HiSSD leverages a hierarchical framework that jointly learns common and task-specific skills. The common skills learn cooperative temporal knowledge and enable in-sample exploitation for offline multi-task MARL. The task-specific skills represent the priors of each task and achieve a task-guided fine-grained action execution. To verify the advancement of our method, we conduct experiments on multi-agent MuJoCo and SMAC benchmarks. After training the policy using HiSSD on offline multi-task data, the empirical results show that HiSSD assigns effective cooperative behaviors and obtains superior performance in unseen tasks.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2503.21200
- https://arxiv.org/pdf/2503.21200
- OA Status
- green
- OpenAlex ID
- https://openalex.org/W4415062185
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4415062185Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2503.21200Digital Object Identifier
- Title
-
Learning Generalizable Skills from Offline Multi-Task Data for Multi-Agent CooperationWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2025Year of publication
- Publication date
-
2025-03-27Full publication date if available
- Authors
-
Sicong Liu, Yang Shu, Chenjuan Guo, Bin YangList of authors in order
- Landing page
-
https://arxiv.org/abs/2503.21200Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2503.21200Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2503.21200Direct OA link when available
- Cited by
-
0Total citation count in OpenAlex
Full payload
| id | https://openalex.org/W4415062185 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2503.21200 |
| ids.doi | https://doi.org/10.48550/arxiv.2503.21200 |
| ids.openalex | https://openalex.org/W4415062185 |
| fwci | |
| type | preprint |
| title | Learning Generalizable Skills from Offline Multi-Task Data for Multi-Agent Cooperation |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T11303 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.97079998254776 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1702 |
| topics[0].subfield.display_name | Artificial Intelligence |
| topics[0].display_name | Bayesian Modeling and Causal Inference |
| topics[1].id | https://openalex.org/T11106 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9470000267028809 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1711 |
| topics[1].subfield.display_name | Signal Processing |
| topics[1].display_name | Data Management and Algorithms |
| topics[2].id | https://openalex.org/T12761 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.941100001335144 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1702 |
| topics[2].subfield.display_name | Artificial Intelligence |
| topics[2].display_name | Data Stream Mining Techniques |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2503.21200 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2503.21200 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2503.21200 |
| locations[1].id | doi:10.48550/arxiv.2503.21200 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | cc-by |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | https://openalex.org/licenses/cc-by |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2503.21200 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5028339258 |
| authorships[0].author.orcid | https://orcid.org/0000-0003-1078-7006 |
| authorships[0].author.display_name | Sicong Liu |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Liu, Sicong |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5057100441 |
| authorships[1].author.orcid | https://orcid.org/0000-0002-8304-1531 |
| authorships[1].author.display_name | Yang Shu |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Shu, Yang |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5084021933 |
| authorships[2].author.orcid | https://orcid.org/0000-0002-4516-4637 |
| authorships[2].author.display_name | Chenjuan Guo |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Guo, Chenjuan |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5101717968 |
| authorships[3].author.orcid | https://orcid.org/0000-0002-8322-117X |
| authorships[3].author.display_name | Bin Yang |
| authorships[3].author_position | last |
| authorships[3].raw_author_name | Yang, Bin |
| authorships[3].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2503.21200 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-11T00:00:00 |
| display_name | Learning Generalizable Skills from Offline Multi-Task Data for Multi-Agent Cooperation |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T11303 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.97079998254776 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1702 |
| primary_topic.subfield.display_name | Artificial Intelligence |
| primary_topic.display_name | Bayesian Modeling and Causal Inference |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2503.21200 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2503.21200 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2503.21200 |
| primary_location.id | pmh:oai:arXiv.org:2503.21200 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2503.21200 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2503.21200 |
| publication_date | 2025-03-27 |
| publication_year | 2025 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 43, 115, 128, 164 |
| abstract_inverted_index.To | 103, 169 |
| abstract_inverted_index.an | 22 |
| abstract_inverted_index.as | 36, 69, 93 |
| abstract_inverted_index.in | 25, 56, 96, 209 |
| abstract_inverted_index.is | 21, 42 |
| abstract_inverted_index.of | 17, 53, 159, 173 |
| abstract_inverted_index.on | 179, 191 |
| abstract_inverted_index.to | 11, 38 |
| abstract_inverted_index.we | 107, 176 |
| abstract_inverted_index.The | 138, 153 |
| abstract_inverted_index.and | 19, 86, 110, 135, 145, 162, 182, 205 |
| abstract_inverted_index.can | 9, 87 |
| abstract_inverted_index.for | 99, 118, 149 |
| abstract_inverted_index.not | 88 |
| abstract_inverted_index.our | 174 |
| abstract_inverted_index.the | 50, 157, 171, 187, 195 |
| abstract_inverted_index.two | 46 |
| abstract_inverted_index.MARL | 122 |
| abstract_inverted_index.SMAC | 183 |
| abstract_inverted_index.data | 7 |
| abstract_inverted_index.each | 97, 160 |
| abstract_inverted_index.from | 4, 65 |
| abstract_inverted_index.into | 77 |
| abstract_inverted_index.many | 26 |
| abstract_inverted_index.only | 82 |
| abstract_inverted_index.show | 198 |
| abstract_inverted_index.task | 98, 161 |
| abstract_inverted_index.that | 8, 131, 199 |
| abstract_inverted_index.with | 14 |
| abstract_inverted_index.After | 185 |
| abstract_inverted_index.HiSSD | 126, 190, 200 |
| abstract_inverted_index.MARL. | 59, 152 |
| abstract_inverted_index.Skill | 112 |
| abstract_inverted_index.among | 33 |
| abstract_inverted_index.data, | 194 |
| abstract_inverted_index.lacks | 72 |
| abstract_inverted_index.learn | 141 |
| abstract_inverted_index.novel | 116 |
| abstract_inverted_index.skill | 54, 124 |
| abstract_inverted_index.tasks | 13, 35 |
| abstract_inverted_index.them. | 78 |
| abstract_inverted_index.these | 105 |
| abstract_inverted_index.using | 189 |
| abstract_inverted_index.works | 81 |
| abstract_inverted_index.MuJoCo | 181 |
| abstract_inverted_index.action | 67, 101, 167 |
| abstract_inverted_index.agents | 18 |
| abstract_inverted_index.choose | 90 |
| abstract_inverted_index.common | 70, 84, 134, 139 |
| abstract_inverted_index.enable | 146 |
| abstract_inverted_index.hinder | 49 |
| abstract_inverted_index.learns | 133 |
| abstract_inverted_index.policy | 3, 40, 188 |
| abstract_inverted_index.priors | 158 |
| abstract_inverted_index.skills | 37, 71, 85, 95, 140, 155 |
| abstract_inverted_index.tackle | 104 |
| abstract_inverted_index.tasks. | 211 |
| abstract_inverted_index.unseen | 12, 210 |
| abstract_inverted_index.verify | 170 |
| abstract_inverted_index.achieve | 163 |
| abstract_inverted_index.assigns | 201 |
| abstract_inverted_index.conduct | 177 |
| abstract_inverted_index.further | 51 |
| abstract_inverted_index.general | 30, 62 |
| abstract_inverted_index.improve | 39 |
| abstract_inverted_index.involve | 83 |
| abstract_inverted_index.jointly | 132 |
| abstract_inverted_index.method, | 175 |
| abstract_inverted_index.numbers | 16 |
| abstract_inverted_index.obtains | 206 |
| abstract_inverted_index.offline | 5, 57, 120, 150, 192 |
| abstract_inverted_index.primary | 47 |
| abstract_inverted_index.problem | 24 |
| abstract_inverted_index.propose | 108 |
| abstract_inverted_index.results | 197 |
| abstract_inverted_index.skills. | 137 |
| abstract_inverted_index.targets | 20 |
| abstract_inverted_index.through | 123 |
| abstract_inverted_index.various | 66 |
| abstract_inverted_index.varying | 15 |
| abstract_inverted_index.(HiSSD), | 114 |
| abstract_inverted_index.Although | 28 |
| abstract_inverted_index.Firstly, | 60 |
| abstract_inverted_index.Learning | 0 |
| abstract_inverted_index.Separate | 111 |
| abstract_inverted_index.approach | 117 |
| abstract_inverted_index.behavior | 31 |
| abstract_inverted_index.bringing | 73 |
| abstract_inverted_index.existing | 80 |
| abstract_inverted_index.learning | 55 |
| abstract_inverted_index.multiple | 34 |
| abstract_inverted_index.patterns | 32 |
| abstract_inverted_index.superior | 207 |
| abstract_inverted_index.temporal | 75, 143 |
| abstract_inverted_index.training | 186 |
| abstract_inverted_index.transfer | 41 |
| abstract_inverted_index.Discovery | 113 |
| abstract_inverted_index.Secondly, | 79 |
| abstract_inverted_index.approach, | 45 |
| abstract_inverted_index.behaviors | 64, 204 |
| abstract_inverted_index.effective | 202 |
| abstract_inverted_index.empirical | 196 |
| abstract_inverted_index.framework | 130 |
| abstract_inverted_index.in-sample | 147 |
| abstract_inverted_index.knowledge | 76, 92, 144 |
| abstract_inverted_index.learning. | 125 |
| abstract_inverted_index.leverages | 127 |
| abstract_inverted_index.promising | 44 |
| abstract_inverted_index.represent | 156 |
| abstract_inverted_index.sequences | 68 |
| abstract_inverted_index.adaptively | 89 |
| abstract_inverted_index.attractive | 23 |
| abstract_inverted_index.challenges | 48 |
| abstract_inverted_index.execution. | 102, 168 |
| abstract_inverted_index.extracting | 61 |
| abstract_inverted_index.generalize | 10 |
| abstract_inverted_index.multi-task | 6, 58, 121, 151, 193 |
| abstract_inverted_index.scenarios. | 27 |
| abstract_inverted_index.advancement | 52, 172 |
| abstract_inverted_index.aggregating | 29 |
| abstract_inverted_index.benchmarks. | 184 |
| abstract_inverted_index.challenges, | 106 |
| abstract_inverted_index.cooperative | 1, 63, 74, 142, 203 |
| abstract_inverted_index.experiments | 178 |
| abstract_inverted_index.independent | 91 |
| abstract_inverted_index.multi-agent | 2, 180 |
| abstract_inverted_index.performance | 208 |
| abstract_inverted_index.task-guided | 165 |
| abstract_inverted_index.Hierarchical | 109 |
| abstract_inverted_index.exploitation | 148 |
| abstract_inverted_index.fine-grained | 100, 166 |
| abstract_inverted_index.hierarchical | 129 |
| abstract_inverted_index.generalizable | 119 |
| abstract_inverted_index.task-specific | 94, 136, 154 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 4 |
| citation_normalized_percentile |