CMoE: Converting Mixture-of-Experts from Dense to Accelerate LLM Inference Article Swipe
YOU?
·
· 2025
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2502.04416
Scaling large language models (LLMs) improves performance but dramatically increases inference costs. The feed-forward network (FFN), consuming approximately 70\% of inference compute, represents a critical bottleneck, particularly in large batch size scenarios. While mixture-of-experts (MoE) architectures leverage activation sparsity for efficiency, converting existing dense models to MoEs traditionally requires resource-intensive continual pre-training. We present CMoE, a framework that rapidly transforms dense LLMs into MoEs without training. The key innovation lies in analyzing FFN neuron activations to partition them into shared (always active) and routed experts. Routed neurons are clustered using a balanced assignment algorithm, and a differentiable router is constructed analytically from activation statistics, enabling immediate deployment or optional lightweight fine-tuning. Experiments demonstrate that, with activation ratio of 75\%, it achieves remarkable results, delivering lossless precision in terms of perplexity while still maintaining a 5\% acceleration. Further experiments reveal that a CMoE configuration activating just 25\% of parameters reduces end-to-end latency by 1.5x while preserving usable perplexity without additional training. Moreover, a brief LoRA fine-tuning process (requiring only 1 hour and 2,000 samples) successfully recovers over 76\% of the dense model's downstream accuracy. By effectively balancing performance and efficiency, CMoE offers a viable path forward for deploying LLMs in real-world scenarios where computational resources are limited. We make our code publicly available at https://github.com/JarvisPei/CMoE.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2502.04416
- https://arxiv.org/pdf/2502.04416
- OA Status
- green
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4407308521
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4407308521Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2502.04416Digital Object Identifier
- Title
-
CMoE: Converting Mixture-of-Experts from Dense to Accelerate LLM InferenceWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2025Year of publication
- Publication date
-
2025-02-06Full publication date if available
- Authors
-
Zehua Pei, Lan Zou, Hui-Ling Zhen, X. D. Yu, Wulong Liu, Sinno Jialin Pan, Mingxuan Yuan, Bei YuList of authors in order
- Landing page
-
https://arxiv.org/abs/2502.04416Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2502.04416Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2502.04416Direct OA link when available
- Concepts
-
Carving, Inference, Computer science, Artificial intelligence, Art, Visual artsTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4407308521 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2502.04416 |
| ids.doi | https://doi.org/10.48550/arxiv.2502.04416 |
| ids.openalex | https://openalex.org/W4407308521 |
| fwci | |
| type | preprint |
| title | CMoE: Converting Mixture-of-Experts from Dense to Accelerate LLM Inference |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10201 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9430000185966492 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1702 |
| topics[0].subfield.display_name | Artificial Intelligence |
| topics[0].display_name | Speech Recognition and Synthesis |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C2777370761 |
| concepts[0].level | 2 |
| concepts[0].score | 0.8958708047866821 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q18448934 |
| concepts[0].display_name | Carving |
| concepts[1].id | https://openalex.org/C2776214188 |
| concepts[1].level | 2 |
| concepts[1].score | 0.7638281583786011 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q408386 |
| concepts[1].display_name | Inference |
| concepts[2].id | https://openalex.org/C41008148 |
| concepts[2].level | 0 |
| concepts[2].score | 0.4613404870033264 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[2].display_name | Computer science |
| concepts[3].id | https://openalex.org/C154945302 |
| concepts[3].level | 1 |
| concepts[3].score | 0.35500872135162354 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[3].display_name | Artificial intelligence |
| concepts[4].id | https://openalex.org/C142362112 |
| concepts[4].level | 0 |
| concepts[4].score | 0.15716075897216797 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q735 |
| concepts[4].display_name | Art |
| concepts[5].id | https://openalex.org/C153349607 |
| concepts[5].level | 1 |
| concepts[5].score | 0.12826502323150635 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q36649 |
| concepts[5].display_name | Visual arts |
| keywords[0].id | https://openalex.org/keywords/carving |
| keywords[0].score | 0.8958708047866821 |
| keywords[0].display_name | Carving |
| keywords[1].id | https://openalex.org/keywords/inference |
| keywords[1].score | 0.7638281583786011 |
| keywords[1].display_name | Inference |
| keywords[2].id | https://openalex.org/keywords/computer-science |
| keywords[2].score | 0.4613404870033264 |
| keywords[2].display_name | Computer science |
| keywords[3].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[3].score | 0.35500872135162354 |
| keywords[3].display_name | Artificial intelligence |
| keywords[4].id | https://openalex.org/keywords/art |
| keywords[4].score | 0.15716075897216797 |
| keywords[4].display_name | Art |
| keywords[5].id | https://openalex.org/keywords/visual-arts |
| keywords[5].score | 0.12826502323150635 |
| keywords[5].display_name | Visual arts |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2502.04416 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2502.04416 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2502.04416 |
| locations[1].id | doi:10.48550/arxiv.2502.04416 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2502.04416 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5071208161 |
| authorships[0].author.orcid | https://orcid.org/0000-0002-2884-2459 |
| authorships[0].author.display_name | Zehua Pei |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Pei, Zehua |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5040299913 |
| authorships[1].author.orcid | https://orcid.org/0000-0003-2443-0817 |
| authorships[1].author.display_name | Lan Zou |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Zou, Lancheng |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5108154818 |
| authorships[2].author.orcid | |
| authorships[2].author.display_name | Hui-Ling Zhen |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Zhen, Hui-Ling |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5102718069 |
| authorships[3].author.orcid | https://orcid.org/0000-0002-6060-2383 |
| authorships[3].author.display_name | X. D. Yu |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Yu, Xianzhi |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5036527488 |
| authorships[4].author.orcid | https://orcid.org/0000-0003-2901-1523 |
| authorships[4].author.display_name | Wulong Liu |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Liu, Wulong |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A5082984558 |
| authorships[5].author.orcid | https://orcid.org/0000-0001-6565-3836 |
| authorships[5].author.display_name | Sinno Jialin Pan |
| authorships[5].author_position | middle |
| authorships[5].raw_author_name | Pan, Sinno Jialin |
| authorships[5].is_corresponding | False |
| authorships[6].author.id | https://openalex.org/A5078949174 |
| authorships[6].author.orcid | https://orcid.org/0000-0002-2236-8784 |
| authorships[6].author.display_name | Mingxuan Yuan |
| authorships[6].author_position | middle |
| authorships[6].raw_author_name | Yuan, Mingxuan |
| authorships[6].is_corresponding | False |
| authorships[7].author.id | https://openalex.org/A5051340429 |
| authorships[7].author.orcid | https://orcid.org/0000-0001-6406-4810 |
| authorships[7].author.display_name | Bei Yu |
| authorships[7].author_position | last |
| authorships[7].raw_author_name | Yu, Bei |
| authorships[7].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2502.04416 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-02-11T00:00:00 |
| display_name | CMoE: Converting Mixture-of-Experts from Dense to Accelerate LLM Inference |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10201 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9430000185966492 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1702 |
| primary_topic.subfield.display_name | Artificial Intelligence |
| primary_topic.display_name | Speech Recognition and Synthesis |
| related_works | https://openalex.org/W4391375266, https://openalex.org/W2899084033, https://openalex.org/W2748952813, https://openalex.org/W1598065702, https://openalex.org/W2990263010, https://openalex.org/W2529165695, https://openalex.org/W2361515432, https://openalex.org/W2389624439, https://openalex.org/W2770783854, https://openalex.org/W2364013810 |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2502.04416 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2502.04416 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2502.04416 |
| primary_location.id | pmh:oai:arXiv.org:2502.04416 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2502.04416 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2502.04416 |
| publication_date | 2025-02-06 |
| publication_year | 2025 |
| referenced_works_count | 0 |
| abstract_inverted_index.1 | 168 |
| abstract_inverted_index.a | 23, 55, 90, 95, 133, 140, 161, 191 |
| abstract_inverted_index.By | 183 |
| abstract_inverted_index.We | 52, 206 |
| abstract_inverted_index.at | 212 |
| abstract_inverted_index.by | 151 |
| abstract_inverted_index.in | 27, 70, 126, 198 |
| abstract_inverted_index.is | 98 |
| abstract_inverted_index.it | 119 |
| abstract_inverted_index.of | 19, 117, 128, 146, 177 |
| abstract_inverted_index.or | 107 |
| abstract_inverted_index.to | 45, 75 |
| abstract_inverted_index.5\% | 134 |
| abstract_inverted_index.FFN | 72 |
| abstract_inverted_index.The | 12, 66 |
| abstract_inverted_index.and | 82, 94, 170, 187 |
| abstract_inverted_index.are | 87, 204 |
| abstract_inverted_index.but | 7 |
| abstract_inverted_index.for | 39, 195 |
| abstract_inverted_index.key | 67 |
| abstract_inverted_index.our | 208 |
| abstract_inverted_index.the | 178 |
| abstract_inverted_index.1.5x | 152 |
| abstract_inverted_index.25\% | 145 |
| abstract_inverted_index.70\% | 18 |
| abstract_inverted_index.76\% | 176 |
| abstract_inverted_index.CMoE | 141, 189 |
| abstract_inverted_index.LLMs | 61, 197 |
| abstract_inverted_index.LoRA | 163 |
| abstract_inverted_index.MoEs | 46, 63 |
| abstract_inverted_index.code | 209 |
| abstract_inverted_index.from | 101 |
| abstract_inverted_index.hour | 169 |
| abstract_inverted_index.into | 62, 78 |
| abstract_inverted_index.just | 144 |
| abstract_inverted_index.lies | 69 |
| abstract_inverted_index.make | 207 |
| abstract_inverted_index.only | 167 |
| abstract_inverted_index.over | 175 |
| abstract_inverted_index.path | 193 |
| abstract_inverted_index.size | 30 |
| abstract_inverted_index.that | 57, 139 |
| abstract_inverted_index.them | 77 |
| abstract_inverted_index.with | 114 |
| abstract_inverted_index.(MoE) | 34 |
| abstract_inverted_index.2,000 | 171 |
| abstract_inverted_index.75\%, | 118 |
| abstract_inverted_index.CMoE, | 54 |
| abstract_inverted_index.While | 32 |
| abstract_inverted_index.batch | 29 |
| abstract_inverted_index.brief | 162 |
| abstract_inverted_index.dense | 43, 60, 179 |
| abstract_inverted_index.large | 1, 28 |
| abstract_inverted_index.ratio | 116 |
| abstract_inverted_index.still | 131 |
| abstract_inverted_index.terms | 127 |
| abstract_inverted_index.that, | 113 |
| abstract_inverted_index.using | 89 |
| abstract_inverted_index.where | 201 |
| abstract_inverted_index.while | 130, 153 |
| abstract_inverted_index.(FFN), | 15 |
| abstract_inverted_index.(LLMs) | 4 |
| abstract_inverted_index.Routed | 85 |
| abstract_inverted_index.costs. | 11 |
| abstract_inverted_index.models | 3, 44 |
| abstract_inverted_index.neuron | 73 |
| abstract_inverted_index.offers | 190 |
| abstract_inverted_index.reveal | 138 |
| abstract_inverted_index.routed | 83 |
| abstract_inverted_index.router | 97 |
| abstract_inverted_index.shared | 79 |
| abstract_inverted_index.usable | 155 |
| abstract_inverted_index.viable | 192 |
| abstract_inverted_index.(always | 80 |
| abstract_inverted_index.Further | 136 |
| abstract_inverted_index.Scaling | 0 |
| abstract_inverted_index.active) | 81 |
| abstract_inverted_index.forward | 194 |
| abstract_inverted_index.latency | 150 |
| abstract_inverted_index.model's | 180 |
| abstract_inverted_index.network | 14 |
| abstract_inverted_index.neurons | 86 |
| abstract_inverted_index.present | 53 |
| abstract_inverted_index.process | 165 |
| abstract_inverted_index.rapidly | 58 |
| abstract_inverted_index.reduces | 148 |
| abstract_inverted_index.without | 64, 157 |
| abstract_inverted_index.achieves | 120 |
| abstract_inverted_index.balanced | 91 |
| abstract_inverted_index.compute, | 21 |
| abstract_inverted_index.critical | 24 |
| abstract_inverted_index.enabling | 104 |
| abstract_inverted_index.existing | 42 |
| abstract_inverted_index.experts. | 84 |
| abstract_inverted_index.improves | 5 |
| abstract_inverted_index.language | 2 |
| abstract_inverted_index.leverage | 36 |
| abstract_inverted_index.limited. | 205 |
| abstract_inverted_index.lossless | 124 |
| abstract_inverted_index.optional | 108 |
| abstract_inverted_index.publicly | 210 |
| abstract_inverted_index.recovers | 174 |
| abstract_inverted_index.requires | 48 |
| abstract_inverted_index.results, | 122 |
| abstract_inverted_index.samples) | 172 |
| abstract_inverted_index.sparsity | 38 |
| abstract_inverted_index.Moreover, | 160 |
| abstract_inverted_index.accuracy. | 182 |
| abstract_inverted_index.analyzing | 71 |
| abstract_inverted_index.available | 211 |
| abstract_inverted_index.balancing | 185 |
| abstract_inverted_index.clustered | 88 |
| abstract_inverted_index.consuming | 16 |
| abstract_inverted_index.continual | 50 |
| abstract_inverted_index.deploying | 196 |
| abstract_inverted_index.framework | 56 |
| abstract_inverted_index.immediate | 105 |
| abstract_inverted_index.increases | 9 |
| abstract_inverted_index.inference | 10, 20 |
| abstract_inverted_index.partition | 76 |
| abstract_inverted_index.precision | 125 |
| abstract_inverted_index.resources | 203 |
| abstract_inverted_index.scenarios | 200 |
| abstract_inverted_index.training. | 65, 159 |
| abstract_inverted_index.(requiring | 166 |
| abstract_inverted_index.activating | 143 |
| abstract_inverted_index.activation | 37, 102, 115 |
| abstract_inverted_index.additional | 158 |
| abstract_inverted_index.algorithm, | 93 |
| abstract_inverted_index.assignment | 92 |
| abstract_inverted_index.converting | 41 |
| abstract_inverted_index.delivering | 123 |
| abstract_inverted_index.deployment | 106 |
| abstract_inverted_index.downstream | 181 |
| abstract_inverted_index.end-to-end | 149 |
| abstract_inverted_index.innovation | 68 |
| abstract_inverted_index.parameters | 147 |
| abstract_inverted_index.perplexity | 129, 156 |
| abstract_inverted_index.preserving | 154 |
| abstract_inverted_index.real-world | 199 |
| abstract_inverted_index.remarkable | 121 |
| abstract_inverted_index.represents | 22 |
| abstract_inverted_index.scenarios. | 31 |
| abstract_inverted_index.transforms | 59 |
| abstract_inverted_index.Experiments | 111 |
| abstract_inverted_index.activations | 74 |
| abstract_inverted_index.bottleneck, | 25 |
| abstract_inverted_index.constructed | 99 |
| abstract_inverted_index.demonstrate | 112 |
| abstract_inverted_index.effectively | 184 |
| abstract_inverted_index.efficiency, | 40, 188 |
| abstract_inverted_index.experiments | 137 |
| abstract_inverted_index.fine-tuning | 164 |
| abstract_inverted_index.lightweight | 109 |
| abstract_inverted_index.maintaining | 132 |
| abstract_inverted_index.performance | 6, 186 |
| abstract_inverted_index.statistics, | 103 |
| abstract_inverted_index.analytically | 100 |
| abstract_inverted_index.dramatically | 8 |
| abstract_inverted_index.feed-forward | 13 |
| abstract_inverted_index.fine-tuning. | 110 |
| abstract_inverted_index.particularly | 26 |
| abstract_inverted_index.successfully | 173 |
| abstract_inverted_index.acceleration. | 135 |
| abstract_inverted_index.approximately | 17 |
| abstract_inverted_index.architectures | 35 |
| abstract_inverted_index.computational | 202 |
| abstract_inverted_index.configuration | 142 |
| abstract_inverted_index.pre-training. | 51 |
| abstract_inverted_index.traditionally | 47 |
| abstract_inverted_index.differentiable | 96 |
| abstract_inverted_index.mixture-of-experts | 33 |
| abstract_inverted_index.resource-intensive | 49 |
| abstract_inverted_index.https://github.com/JarvisPei/CMoE. | 213 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 8 |
| citation_normalized_percentile |