Efficient Diffusion Transformer Policies with Mixture of Expert Denoisers for Multitask Learning Article Swipe
YOU?
·
· 2024
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2412.12953
Diffusion Policies have become widely used in Imitation Learning, offering several appealing properties, such as generating multimodal and discontinuous behavior. As models are becoming larger to capture more complex capabilities, their computational demands increase, as shown by recent scaling laws. Therefore, continuing with the current architectures will present a computational roadblock. To address this gap, we propose Mixture-of-Denoising Experts (MoDE) as a novel policy for Imitation Learning. MoDE surpasses current state-of-the-art Transformer-based Diffusion Policies while enabling parameter-efficient scaling through sparse experts and noise-conditioned routing, reducing both active parameters by 40% and inference costs by 90% via expert caching. Our architecture combines this efficient scaling with noise-conditioned self-attention mechanism, enabling more effective denoising across different noise levels. MoDE achieves state-of-the-art performance on 134 tasks in four established imitation learning benchmarks (CALVIN and LIBERO). Notably, by pretraining MoDE on diverse robotics data, we achieve 4.01 on CALVIN ABC and 0.95 on LIBERO-90. It surpasses both CNN-based and Transformer Diffusion Policies by an average of 57% across 4 benchmarks, while using 90% fewer FLOPs and fewer active parameters compared to default Diffusion Transformer architectures. Furthermore, we conduct comprehensive ablations on MoDE's components, providing insights for designing efficient and scalable Transformer architectures for Diffusion Policies. Code and demonstrations are available at https://mbreuss.github.io/MoDE_Diffusion_Policy/.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2412.12953
- https://arxiv.org/pdf/2412.12953
- OA Status
- green
- Cited By
- 1
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4405562544
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4405562544Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2412.12953Digital Object Identifier
- Title
-
Efficient Diffusion Transformer Policies with Mixture of Expert Denoisers for Multitask LearningWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2024Year of publication
- Publication date
-
2024-12-17Full publication date if available
- Authors
-
Moritz Reuss, Jyothish Pari, Priyanka Agrawal, Rudolf LioutikovList of authors in order
- Landing page
-
https://arxiv.org/abs/2412.12953Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2412.12953Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2412.12953Direct OA link when available
- Concepts
-
Transformer, Computer science, Electrical engineering, Engineering, VoltageTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
1Total citation count in OpenAlex
- Citations by year (recent)
-
2025: 1Per-year citation counts (last 5 years)
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4405562544 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2412.12953 |
| ids.doi | https://doi.org/10.48550/arxiv.2412.12953 |
| ids.openalex | https://openalex.org/W4405562544 |
| fwci | |
| type | preprint |
| title | Efficient Diffusion Transformer Policies with Mixture of Expert Denoisers for Multitask Learning |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10320 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.8427000045776367 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1702 |
| topics[0].subfield.display_name | Artificial Intelligence |
| topics[0].display_name | Neural Networks and Applications |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C66322947 |
| concepts[0].level | 3 |
| concepts[0].score | 0.6591969728469849 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q11658 |
| concepts[0].display_name | Transformer |
| concepts[1].id | https://openalex.org/C41008148 |
| concepts[1].level | 0 |
| concepts[1].score | 0.492123544216156 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[1].display_name | Computer science |
| concepts[2].id | https://openalex.org/C119599485 |
| concepts[2].level | 1 |
| concepts[2].score | 0.22050225734710693 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q43035 |
| concepts[2].display_name | Electrical engineering |
| concepts[3].id | https://openalex.org/C127413603 |
| concepts[3].level | 0 |
| concepts[3].score | 0.15427592396736145 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q11023 |
| concepts[3].display_name | Engineering |
| concepts[4].id | https://openalex.org/C165801399 |
| concepts[4].level | 2 |
| concepts[4].score | 0.08585035800933838 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q25428 |
| concepts[4].display_name | Voltage |
| keywords[0].id | https://openalex.org/keywords/transformer |
| keywords[0].score | 0.6591969728469849 |
| keywords[0].display_name | Transformer |
| keywords[1].id | https://openalex.org/keywords/computer-science |
| keywords[1].score | 0.492123544216156 |
| keywords[1].display_name | Computer science |
| keywords[2].id | https://openalex.org/keywords/electrical-engineering |
| keywords[2].score | 0.22050225734710693 |
| keywords[2].display_name | Electrical engineering |
| keywords[3].id | https://openalex.org/keywords/engineering |
| keywords[3].score | 0.15427592396736145 |
| keywords[3].display_name | Engineering |
| keywords[4].id | https://openalex.org/keywords/voltage |
| keywords[4].score | 0.08585035800933838 |
| keywords[4].display_name | Voltage |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2412.12953 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2412.12953 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2412.12953 |
| locations[1].id | doi:10.48550/arxiv.2412.12953 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | cc-by |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | https://openalex.org/licenses/cc-by |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2412.12953 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5056556464 |
| authorships[0].author.orcid | |
| authorships[0].author.display_name | Moritz Reuss |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Reuss, Moritz |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5058187411 |
| authorships[1].author.orcid | |
| authorships[1].author.display_name | Jyothish Pari |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Pari, Jyothish |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5047179830 |
| authorships[2].author.orcid | https://orcid.org/0000-0001-9894-9625 |
| authorships[2].author.display_name | Priyanka Agrawal |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Agrawal, Pulkit |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5033283742 |
| authorships[3].author.orcid | https://orcid.org/0000-0002-8924-7514 |
| authorships[3].author.display_name | Rudolf Lioutikov |
| authorships[3].author_position | last |
| authorships[3].raw_author_name | Lioutikov, Rudolf |
| authorships[3].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2412.12953 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2024-12-19T00:00:00 |
| display_name | Efficient Diffusion Transformer Policies with Mixture of Expert Denoisers for Multitask Learning |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10320 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.8427000045776367 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1702 |
| primary_topic.subfield.display_name | Artificial Intelligence |
| primary_topic.display_name | Neural Networks and Applications |
| related_works | https://openalex.org/W4391375266, https://openalex.org/W2899084033, https://openalex.org/W2748952813, https://openalex.org/W2390279801, https://openalex.org/W4391913857, https://openalex.org/W2358668433, https://openalex.org/W4396701345, https://openalex.org/W2376932109, https://openalex.org/W2001405890, https://openalex.org/W4396696052 |
| cited_by_count | 1 |
| counts_by_year[0].year | 2025 |
| counts_by_year[0].cited_by_count | 1 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2412.12953 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2412.12953 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2412.12953 |
| primary_location.id | pmh:oai:arXiv.org:2412.12953 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2412.12953 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2412.12953 |
| publication_date | 2024-12-17 |
| publication_year | 2024 |
| referenced_works_count | 0 |
| abstract_inverted_index.4 | 164 |
| abstract_inverted_index.a | 48, 61 |
| abstract_inverted_index.As | 20 |
| abstract_inverted_index.It | 150 |
| abstract_inverted_index.To | 51 |
| abstract_inverted_index.an | 159 |
| abstract_inverted_index.as | 14, 34, 60 |
| abstract_inverted_index.at | 206 |
| abstract_inverted_index.by | 36, 88, 93, 133, 158 |
| abstract_inverted_index.in | 6, 123 |
| abstract_inverted_index.of | 161 |
| abstract_inverted_index.on | 120, 136, 143, 148, 186 |
| abstract_inverted_index.to | 25, 176 |
| abstract_inverted_index.we | 55, 140, 182 |
| abstract_inverted_index.134 | 121 |
| abstract_inverted_index.40% | 89 |
| abstract_inverted_index.57% | 162 |
| abstract_inverted_index.90% | 94, 168 |
| abstract_inverted_index.ABC | 145 |
| abstract_inverted_index.Our | 98 |
| abstract_inverted_index.and | 17, 81, 90, 130, 146, 154, 171, 194, 202 |
| abstract_inverted_index.are | 22, 204 |
| abstract_inverted_index.for | 64, 191, 198 |
| abstract_inverted_index.the | 43 |
| abstract_inverted_index.via | 95 |
| abstract_inverted_index.0.95 | 147 |
| abstract_inverted_index.4.01 | 142 |
| abstract_inverted_index.Code | 201 |
| abstract_inverted_index.MoDE | 67, 116, 135 |
| abstract_inverted_index.both | 85, 152 |
| abstract_inverted_index.four | 124 |
| abstract_inverted_index.gap, | 54 |
| abstract_inverted_index.have | 2 |
| abstract_inverted_index.more | 27, 109 |
| abstract_inverted_index.such | 13 |
| abstract_inverted_index.this | 53, 101 |
| abstract_inverted_index.used | 5 |
| abstract_inverted_index.will | 46 |
| abstract_inverted_index.with | 42, 104 |
| abstract_inverted_index.FLOPs | 170 |
| abstract_inverted_index.costs | 92 |
| abstract_inverted_index.data, | 139 |
| abstract_inverted_index.fewer | 169, 172 |
| abstract_inverted_index.laws. | 39 |
| abstract_inverted_index.noise | 114 |
| abstract_inverted_index.novel | 62 |
| abstract_inverted_index.shown | 35 |
| abstract_inverted_index.tasks | 122 |
| abstract_inverted_index.their | 30 |
| abstract_inverted_index.using | 167 |
| abstract_inverted_index.while | 74, 166 |
| abstract_inverted_index.(MoDE) | 59 |
| abstract_inverted_index.CALVIN | 144 |
| abstract_inverted_index.MoDE's | 187 |
| abstract_inverted_index.across | 112, 163 |
| abstract_inverted_index.active | 86, 173 |
| abstract_inverted_index.become | 3 |
| abstract_inverted_index.expert | 96 |
| abstract_inverted_index.larger | 24 |
| abstract_inverted_index.models | 21 |
| abstract_inverted_index.policy | 63 |
| abstract_inverted_index.recent | 37 |
| abstract_inverted_index.sparse | 79 |
| abstract_inverted_index.widely | 4 |
| abstract_inverted_index.(CALVIN | 129 |
| abstract_inverted_index.Experts | 58 |
| abstract_inverted_index.achieve | 141 |
| abstract_inverted_index.address | 52 |
| abstract_inverted_index.average | 160 |
| abstract_inverted_index.capture | 26 |
| abstract_inverted_index.complex | 28 |
| abstract_inverted_index.conduct | 183 |
| abstract_inverted_index.current | 44, 69 |
| abstract_inverted_index.default | 177 |
| abstract_inverted_index.demands | 32 |
| abstract_inverted_index.diverse | 137 |
| abstract_inverted_index.experts | 80 |
| abstract_inverted_index.levels. | 115 |
| abstract_inverted_index.present | 47 |
| abstract_inverted_index.propose | 56 |
| abstract_inverted_index.scaling | 38, 77, 103 |
| abstract_inverted_index.several | 10 |
| abstract_inverted_index.through | 78 |
| abstract_inverted_index.LIBERO). | 131 |
| abstract_inverted_index.Notably, | 132 |
| abstract_inverted_index.Policies | 1, 73, 157 |
| abstract_inverted_index.achieves | 117 |
| abstract_inverted_index.becoming | 23 |
| abstract_inverted_index.caching. | 97 |
| abstract_inverted_index.combines | 100 |
| abstract_inverted_index.compared | 175 |
| abstract_inverted_index.enabling | 75, 108 |
| abstract_inverted_index.insights | 190 |
| abstract_inverted_index.learning | 127 |
| abstract_inverted_index.offering | 9 |
| abstract_inverted_index.reducing | 84 |
| abstract_inverted_index.robotics | 138 |
| abstract_inverted_index.routing, | 83 |
| abstract_inverted_index.scalable | 195 |
| abstract_inverted_index.CNN-based | 153 |
| abstract_inverted_index.Diffusion | 0, 72, 156, 178, 199 |
| abstract_inverted_index.Imitation | 7, 65 |
| abstract_inverted_index.Learning, | 8 |
| abstract_inverted_index.Learning. | 66 |
| abstract_inverted_index.Policies. | 200 |
| abstract_inverted_index.ablations | 185 |
| abstract_inverted_index.appealing | 11 |
| abstract_inverted_index.available | 205 |
| abstract_inverted_index.behavior. | 19 |
| abstract_inverted_index.denoising | 111 |
| abstract_inverted_index.designing | 192 |
| abstract_inverted_index.different | 113 |
| abstract_inverted_index.effective | 110 |
| abstract_inverted_index.efficient | 102, 193 |
| abstract_inverted_index.imitation | 126 |
| abstract_inverted_index.increase, | 33 |
| abstract_inverted_index.inference | 91 |
| abstract_inverted_index.providing | 189 |
| abstract_inverted_index.surpasses | 68, 151 |
| abstract_inverted_index.LIBERO-90. | 149 |
| abstract_inverted_index.Therefore, | 40 |
| abstract_inverted_index.benchmarks | 128 |
| abstract_inverted_index.continuing | 41 |
| abstract_inverted_index.generating | 15 |
| abstract_inverted_index.mechanism, | 107 |
| abstract_inverted_index.multimodal | 16 |
| abstract_inverted_index.parameters | 87, 174 |
| abstract_inverted_index.roadblock. | 50 |
| abstract_inverted_index.Transformer | 155, 179, 196 |
| abstract_inverted_index.benchmarks, | 165 |
| abstract_inverted_index.components, | 188 |
| abstract_inverted_index.established | 125 |
| abstract_inverted_index.performance | 119 |
| abstract_inverted_index.pretraining | 134 |
| abstract_inverted_index.properties, | 12 |
| abstract_inverted_index.Furthermore, | 181 |
| abstract_inverted_index.architecture | 99 |
| abstract_inverted_index.architectures | 45, 197 |
| abstract_inverted_index.capabilities, | 29 |
| abstract_inverted_index.comprehensive | 184 |
| abstract_inverted_index.computational | 31, 49 |
| abstract_inverted_index.discontinuous | 18 |
| abstract_inverted_index.architectures. | 180 |
| abstract_inverted_index.demonstrations | 203 |
| abstract_inverted_index.self-attention | 106 |
| abstract_inverted_index.state-of-the-art | 70, 118 |
| abstract_inverted_index.Transformer-based | 71 |
| abstract_inverted_index.noise-conditioned | 82, 105 |
| abstract_inverted_index.parameter-efficient | 76 |
| abstract_inverted_index.Mixture-of-Denoising | 57 |
| abstract_inverted_index.https://mbreuss.github.io/MoDE_Diffusion_Policy/. | 207 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 4 |
| citation_normalized_percentile |