Double Distillation Network for Multi-Agent Reinforcement Learning Article Swipe
YOU?
·
· 2025
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2502.03125
Multi-agent reinforcement learning typically employs a centralized training-decentralized execution (CTDE) framework to alleviate the non-stationarity in environment. However, the partial observability during execution may lead to cumulative gap errors gathered by agents, impairing the training of effective collaborative policies. To overcome this challenge, we introduce the Double Distillation Network (DDN), which incorporates two distillation modules aimed at enhancing robust coordination and facilitating the collaboration process under constrained information. The external distillation module uses a global guiding network and a local policy network, employing distillation to reconcile the gap between global training and local execution. In addition, the internal distillation module introduces intrinsic rewards, drawn from state information, to enhance the exploration capabilities of agents. Extensive experiments demonstrate that DDN significantly improves performance across multiple scenarios.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2502.03125
- https://arxiv.org/pdf/2502.03125
- OA Status
- green
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4407221737
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4407221737Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2502.03125Digital Object Identifier
- Title
-
Double Distillation Network for Multi-Agent Reinforcement LearningWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2025Year of publication
- Publication date
-
2025-02-05Full publication date if available
- Authors
-
Yang Zhou, Siying Wang, Wenyu Chen, Ruoning Zhang, Zhitong Zhao, Zixuan ZhangList of authors in order
- Landing page
-
https://arxiv.org/abs/2502.03125Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2502.03125Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2502.03125Direct OA link when available
- Concepts
-
Reinforcement learning, Distillation, Reinforcement, Computer science, Artificial intelligence, Materials science, Chemistry, Chromatography, Composite materialTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4407221737 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2502.03125 |
| ids.doi | https://doi.org/10.48550/arxiv.2502.03125 |
| ids.openalex | https://openalex.org/W4407221737 |
| fwci | |
| type | preprint |
| title | Double Distillation Network for Multi-Agent Reinforcement Learning |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10462 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.3725999891757965 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1702 |
| topics[0].subfield.display_name | Artificial Intelligence |
| topics[0].display_name | Reinforcement Learning in Robotics |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C97541855 |
| concepts[0].level | 2 |
| concepts[0].score | 0.7699376344680786 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q830687 |
| concepts[0].display_name | Reinforcement learning |
| concepts[1].id | https://openalex.org/C204030448 |
| concepts[1].level | 2 |
| concepts[1].score | 0.7101317048072815 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q101017 |
| concepts[1].display_name | Distillation |
| concepts[2].id | https://openalex.org/C67203356 |
| concepts[2].level | 2 |
| concepts[2].score | 0.6033043265342712 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q1321905 |
| concepts[2].display_name | Reinforcement |
| concepts[3].id | https://openalex.org/C41008148 |
| concepts[3].level | 0 |
| concepts[3].score | 0.4730341136455536 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[3].display_name | Computer science |
| concepts[4].id | https://openalex.org/C154945302 |
| concepts[4].level | 1 |
| concepts[4].score | 0.3441741168498993 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[4].display_name | Artificial intelligence |
| concepts[5].id | https://openalex.org/C192562407 |
| concepts[5].level | 0 |
| concepts[5].score | 0.16790443658828735 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q228736 |
| concepts[5].display_name | Materials science |
| concepts[6].id | https://openalex.org/C185592680 |
| concepts[6].level | 0 |
| concepts[6].score | 0.14778411388397217 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q2329 |
| concepts[6].display_name | Chemistry |
| concepts[7].id | https://openalex.org/C43617362 |
| concepts[7].level | 1 |
| concepts[7].score | 0.14454400539398193 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q170050 |
| concepts[7].display_name | Chromatography |
| concepts[8].id | https://openalex.org/C159985019 |
| concepts[8].level | 1 |
| concepts[8].score | 0.08949831128120422 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q181790 |
| concepts[8].display_name | Composite material |
| keywords[0].id | https://openalex.org/keywords/reinforcement-learning |
| keywords[0].score | 0.7699376344680786 |
| keywords[0].display_name | Reinforcement learning |
| keywords[1].id | https://openalex.org/keywords/distillation |
| keywords[1].score | 0.7101317048072815 |
| keywords[1].display_name | Distillation |
| keywords[2].id | https://openalex.org/keywords/reinforcement |
| keywords[2].score | 0.6033043265342712 |
| keywords[2].display_name | Reinforcement |
| keywords[3].id | https://openalex.org/keywords/computer-science |
| keywords[3].score | 0.4730341136455536 |
| keywords[3].display_name | Computer science |
| keywords[4].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[4].score | 0.3441741168498993 |
| keywords[4].display_name | Artificial intelligence |
| keywords[5].id | https://openalex.org/keywords/materials-science |
| keywords[5].score | 0.16790443658828735 |
| keywords[5].display_name | Materials science |
| keywords[6].id | https://openalex.org/keywords/chemistry |
| keywords[6].score | 0.14778411388397217 |
| keywords[6].display_name | Chemistry |
| keywords[7].id | https://openalex.org/keywords/chromatography |
| keywords[7].score | 0.14454400539398193 |
| keywords[7].display_name | Chromatography |
| keywords[8].id | https://openalex.org/keywords/composite-material |
| keywords[8].score | 0.08949831128120422 |
| keywords[8].display_name | Composite material |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2502.03125 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2502.03125 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2502.03125 |
| locations[1].id | doi:10.48550/arxiv.2502.03125 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2502.03125 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5036552413 |
| authorships[0].author.orcid | https://orcid.org/0000-0003-0377-4338 |
| authorships[0].author.display_name | Yang Zhou |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Zhou, Yang |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5002463311 |
| authorships[1].author.orcid | https://orcid.org/0000-0002-3180-0815 |
| authorships[1].author.display_name | Siying Wang |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Wang, Siying |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5100687323 |
| authorships[2].author.orcid | https://orcid.org/0000-0002-9933-8014 |
| authorships[2].author.display_name | Wenyu Chen |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Chen, Wenyu |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5102593631 |
| authorships[3].author.orcid | |
| authorships[3].author.display_name | Ruoning Zhang |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Zhang, Ruoning |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5005812905 |
| authorships[4].author.orcid | https://orcid.org/0000-0002-4882-4063 |
| authorships[4].author.display_name | Zhitong Zhao |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Zhao, Zhitong |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A5100321161 |
| authorships[5].author.orcid | https://orcid.org/0000-0001-9987-1137 |
| authorships[5].author.display_name | Zixuan Zhang |
| authorships[5].author_position | last |
| authorships[5].raw_author_name | Zhang, Zixuan |
| authorships[5].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2502.03125 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Double Distillation Network for Multi-Agent Reinforcement Learning |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10462 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.3725999891757965 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1702 |
| primary_topic.subfield.display_name | Artificial Intelligence |
| primary_topic.display_name | Reinforcement Learning in Robotics |
| related_works | https://openalex.org/W4391375266, https://openalex.org/W2899084033, https://openalex.org/W2748952813, https://openalex.org/W4310083477, https://openalex.org/W2328553770, https://openalex.org/W2920061524, https://openalex.org/W1977959518, https://openalex.org/W2038908348, https://openalex.org/W2107890255, https://openalex.org/W2106552856 |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2502.03125 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2502.03125 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2502.03125 |
| primary_location.id | pmh:oai:arXiv.org:2502.03125 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2502.03125 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2502.03125 |
| publication_date | 2025-02-05 |
| publication_year | 2025 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 5, 73, 78 |
| abstract_inverted_index.In | 94 |
| abstract_inverted_index.To | 39 |
| abstract_inverted_index.at | 56 |
| abstract_inverted_index.by | 30 |
| abstract_inverted_index.in | 15 |
| abstract_inverted_index.of | 35, 112 |
| abstract_inverted_index.to | 11, 25, 84, 107 |
| abstract_inverted_index.we | 43 |
| abstract_inverted_index.DDN | 118 |
| abstract_inverted_index.The | 68 |
| abstract_inverted_index.and | 60, 77, 91 |
| abstract_inverted_index.gap | 27, 87 |
| abstract_inverted_index.may | 23 |
| abstract_inverted_index.the | 13, 18, 33, 45, 62, 86, 96, 109 |
| abstract_inverted_index.two | 52 |
| abstract_inverted_index.from | 104 |
| abstract_inverted_index.lead | 24 |
| abstract_inverted_index.that | 117 |
| abstract_inverted_index.this | 41 |
| abstract_inverted_index.uses | 72 |
| abstract_inverted_index.aimed | 55 |
| abstract_inverted_index.drawn | 103 |
| abstract_inverted_index.local | 79, 92 |
| abstract_inverted_index.state | 105 |
| abstract_inverted_index.under | 65 |
| abstract_inverted_index.which | 50 |
| abstract_inverted_index.(CTDE) | 9 |
| abstract_inverted_index.(DDN), | 49 |
| abstract_inverted_index.Double | 46 |
| abstract_inverted_index.across | 122 |
| abstract_inverted_index.during | 21 |
| abstract_inverted_index.errors | 28 |
| abstract_inverted_index.global | 74, 89 |
| abstract_inverted_index.module | 71, 99 |
| abstract_inverted_index.policy | 80 |
| abstract_inverted_index.robust | 58 |
| abstract_inverted_index.Network | 48 |
| abstract_inverted_index.agents, | 31 |
| abstract_inverted_index.agents. | 113 |
| abstract_inverted_index.between | 88 |
| abstract_inverted_index.employs | 4 |
| abstract_inverted_index.enhance | 108 |
| abstract_inverted_index.guiding | 75 |
| abstract_inverted_index.modules | 54 |
| abstract_inverted_index.network | 76 |
| abstract_inverted_index.partial | 19 |
| abstract_inverted_index.process | 64 |
| abstract_inverted_index.However, | 17 |
| abstract_inverted_index.external | 69 |
| abstract_inverted_index.gathered | 29 |
| abstract_inverted_index.improves | 120 |
| abstract_inverted_index.internal | 97 |
| abstract_inverted_index.learning | 2 |
| abstract_inverted_index.multiple | 123 |
| abstract_inverted_index.network, | 81 |
| abstract_inverted_index.overcome | 40 |
| abstract_inverted_index.rewards, | 102 |
| abstract_inverted_index.training | 34, 90 |
| abstract_inverted_index.Extensive | 114 |
| abstract_inverted_index.addition, | 95 |
| abstract_inverted_index.alleviate | 12 |
| abstract_inverted_index.effective | 36 |
| abstract_inverted_index.employing | 82 |
| abstract_inverted_index.enhancing | 57 |
| abstract_inverted_index.execution | 8, 22 |
| abstract_inverted_index.framework | 10 |
| abstract_inverted_index.impairing | 32 |
| abstract_inverted_index.intrinsic | 101 |
| abstract_inverted_index.introduce | 44 |
| abstract_inverted_index.policies. | 38 |
| abstract_inverted_index.reconcile | 85 |
| abstract_inverted_index.typically | 3 |
| abstract_inverted_index.challenge, | 42 |
| abstract_inverted_index.cumulative | 26 |
| abstract_inverted_index.execution. | 93 |
| abstract_inverted_index.introduces | 100 |
| abstract_inverted_index.scenarios. | 124 |
| abstract_inverted_index.Multi-agent | 0 |
| abstract_inverted_index.centralized | 6 |
| abstract_inverted_index.constrained | 66 |
| abstract_inverted_index.demonstrate | 116 |
| abstract_inverted_index.experiments | 115 |
| abstract_inverted_index.exploration | 110 |
| abstract_inverted_index.performance | 121 |
| abstract_inverted_index.Distillation | 47 |
| abstract_inverted_index.capabilities | 111 |
| abstract_inverted_index.coordination | 59 |
| abstract_inverted_index.distillation | 53, 70, 83, 98 |
| abstract_inverted_index.environment. | 16 |
| abstract_inverted_index.facilitating | 61 |
| abstract_inverted_index.incorporates | 51 |
| abstract_inverted_index.information, | 106 |
| abstract_inverted_index.information. | 67 |
| abstract_inverted_index.collaboration | 63 |
| abstract_inverted_index.collaborative | 37 |
| abstract_inverted_index.observability | 20 |
| abstract_inverted_index.reinforcement | 1 |
| abstract_inverted_index.significantly | 119 |
| abstract_inverted_index.non-stationarity | 14 |
| abstract_inverted_index.training-decentralized | 7 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 6 |
| citation_normalized_percentile |