ERMoE: Eigen-Reparameterized Mixture-of-Experts for Stable Routing and Interpretable Specialization Article Swipe
YOU?
·
· 2025
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2511.10971
Mixture-of-Experts (MoE) architectures expand model capacity by sparsely activating experts but face two core challenges: misalignment between router logits and each expert's internal structure leads to unstable routing and expert underutilization, and load imbalances create straggler bottlenecks. Standard solutions, such as auxiliary load-balancing losses, can reduce load disparities but often weaken expert specialization and hurt downstream performance. To address these issues, we propose ERMoE, a sparse MoE transformer that reparameterizes each expert in a learned orthonormal eigenbasis and replaces learned gating logits with an "Eigenbasis Score", defined as the cosine similarity between input features and an expert's basis. This content-aware routing ties token assignments directly to experts' representation spaces, stabilizing utilization and promoting interpretable specialization without sacrificing sparsity. Crucially, ERMoE removes the need for explicit balancing losses and avoids the interfering gradients they introduce. We show that ERMoE achieves state-of-the-art accuracy on ImageNet classification and cross-modal image-text retrieval benchmarks (e.g., COCO, Flickr30K), while naturally producing flatter expert load distributions. Moreover, a 3D MRI variant (ERMoE-ba) improves brain age prediction accuracy by more than 7\% and yields anatomically interpretable expert specializations. ERMoE thus introduces a new architectural principle for sparse expert models that directly addresses routing instabilities and enables improved performance with scalable, interpretable specialization.
Related Topics
- Type
- preprint
- Landing Page
- http://arxiv.org/abs/2511.10971
- https://arxiv.org/pdf/2511.10971
- OA Status
- green
- OpenAlex ID
- https://openalex.org/W4416341092
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4416341092Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2511.10971Digital Object Identifier
- Title
-
ERMoE: Eigen-Reparameterized Mixture-of-Experts for Stable Routing and Interpretable SpecializationWork title
- Type
-
preprintOpenAlex work type
- Publication year
-
2025Year of publication
- Publication date
-
2025-11-14Full publication date if available
- Authors
-
Shukai Duan, Shixuan Li, Chenzhong Yin, Mingxi Cheng, Heng Ping, Tamoghna Chattopadhyay, Sophia I. Thomopoulos, Shahin Nazarian, Paul Thompson, Paul BogdanList of authors in order
- Landing page
-
https://arxiv.org/abs/2511.10971Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2511.10971Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2511.10971Direct OA link when available
- Cited by
-
0Total citation count in OpenAlex
Full payload
| id | https://openalex.org/W4416341092 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2511.10971 |
| ids.doi | https://doi.org/10.48550/arxiv.2511.10971 |
| ids.openalex | https://openalex.org/W4416341092 |
| fwci | |
| type | preprint |
| title | ERMoE: Eigen-Reparameterized Mixture-of-Experts for Stable Routing and Interpretable Specialization |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| language | |
| locations[0].id | pmh:oai:arXiv.org:2511.10971 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2511.10971 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2511.10971 |
| locations[1].id | doi:10.48550/arxiv.2511.10971 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | cc-by |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | https://openalex.org/licenses/cc-by |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2511.10971 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5035048973 |
| authorships[0].author.orcid | https://orcid.org/0000-0002-0040-3796 |
| authorships[0].author.display_name | Shukai Duan |
| authorships[0].author_position | middle |
| authorships[0].raw_author_name | Duan, Shukai |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5066672910 |
| authorships[1].author.orcid | |
| authorships[1].author.display_name | Shixuan Li |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Li, Shixuan |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5037487522 |
| authorships[2].author.orcid | https://orcid.org/0000-0001-6411-7441 |
| authorships[2].author.display_name | Chenzhong Yin |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Yin, Chenzhong |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5057367919 |
| authorships[3].author.orcid | https://orcid.org/0000-0002-8070-6665 |
| authorships[3].author.display_name | Mingxi Cheng |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Cheng, Mingxi |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5011759850 |
| authorships[4].author.orcid | |
| authorships[4].author.display_name | Heng Ping |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Ping, Heng |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A5074576951 |
| authorships[5].author.orcid | https://orcid.org/0000-0002-3146-6845 |
| authorships[5].author.display_name | Tamoghna Chattopadhyay |
| authorships[5].author_position | middle |
| authorships[5].raw_author_name | Chattopadhyay, Tamoghna |
| authorships[5].is_corresponding | False |
| authorships[6].author.id | https://openalex.org/A5017365268 |
| authorships[6].author.orcid | https://orcid.org/0000-0002-0046-4070 |
| authorships[6].author.display_name | Sophia I. Thomopoulos |
| authorships[6].author_position | middle |
| authorships[6].raw_author_name | Thomopoulos, Sophia I |
| authorships[6].is_corresponding | False |
| authorships[7].author.id | https://openalex.org/A5065681916 |
| authorships[7].author.orcid | |
| authorships[7].author.display_name | Shahin Nazarian |
| authorships[7].author_position | middle |
| authorships[7].raw_author_name | Nazarian, Shahin |
| authorships[7].is_corresponding | False |
| authorships[8].author.id | https://openalex.org/A5106485270 |
| authorships[8].author.orcid | |
| authorships[8].author.display_name | Paul Thompson |
| authorships[8].author_position | last |
| authorships[8].raw_author_name | Thompson, Paul |
| authorships[8].is_corresponding | False |
| authorships[9].author.id | https://openalex.org/A5105925385 |
| authorships[9].author.orcid | https://orcid.org/0000-0003-2118-0816 |
| authorships[9].author.display_name | Paul Bogdan |
| authorships[9].author_position | middle |
| authorships[9].raw_author_name | Bogdan, Paul |
| authorships[9].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2511.10971 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-11-18T00:00:00 |
| display_name | ERMoE: Eigen-Reparameterized Mixture-of-Experts for Stable Routing and Interpretable Specialization |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-28T11:37:54.144956 |
| primary_topic | |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2511.10971 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2511.10971 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2511.10971 |
| primary_location.id | pmh:oai:arXiv.org:2511.10971 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2511.10971 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2511.10971 |
| publication_date | 2025-11-14 |
| publication_year | 2025 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 64, 73, 160, 183 |
| abstract_inverted_index.3D | 161 |
| abstract_inverted_index.To | 57 |
| abstract_inverted_index.We | 134 |
| abstract_inverted_index.an | 83, 95 |
| abstract_inverted_index.as | 40, 87 |
| abstract_inverted_index.by | 6, 170 |
| abstract_inverted_index.in | 72 |
| abstract_inverted_index.on | 141 |
| abstract_inverted_index.to | 25, 105 |
| abstract_inverted_index.we | 61 |
| abstract_inverted_index.7\% | 173 |
| abstract_inverted_index.MRI | 162 |
| abstract_inverted_index.MoE | 66 |
| abstract_inverted_index.age | 167 |
| abstract_inverted_index.and | 19, 28, 31, 53, 77, 94, 111, 127, 144, 174, 196 |
| abstract_inverted_index.but | 10, 48 |
| abstract_inverted_index.can | 44 |
| abstract_inverted_index.for | 123, 187 |
| abstract_inverted_index.new | 184 |
| abstract_inverted_index.the | 88, 121, 129 |
| abstract_inverted_index.two | 12 |
| abstract_inverted_index.This | 98 |
| abstract_inverted_index.core | 13 |
| abstract_inverted_index.each | 20, 70 |
| abstract_inverted_index.face | 11 |
| abstract_inverted_index.hurt | 54 |
| abstract_inverted_index.load | 32, 46, 157 |
| abstract_inverted_index.more | 171 |
| abstract_inverted_index.need | 122 |
| abstract_inverted_index.show | 135 |
| abstract_inverted_index.such | 39 |
| abstract_inverted_index.than | 172 |
| abstract_inverted_index.that | 68, 136, 191 |
| abstract_inverted_index.they | 132 |
| abstract_inverted_index.thus | 181 |
| abstract_inverted_index.ties | 101 |
| abstract_inverted_index.with | 82, 200 |
| abstract_inverted_index.(MoE) | 1 |
| abstract_inverted_index.COCO, | 150 |
| abstract_inverted_index.ERMoE | 119, 137, 180 |
| abstract_inverted_index.brain | 166 |
| abstract_inverted_index.input | 92 |
| abstract_inverted_index.leads | 24 |
| abstract_inverted_index.model | 4 |
| abstract_inverted_index.often | 49 |
| abstract_inverted_index.these | 59 |
| abstract_inverted_index.token | 102 |
| abstract_inverted_index.while | 152 |
| abstract_inverted_index.(e.g., | 149 |
| abstract_inverted_index.ERMoE, | 63 |
| abstract_inverted_index.avoids | 128 |
| abstract_inverted_index.basis. | 97 |
| abstract_inverted_index.cosine | 89 |
| abstract_inverted_index.create | 34 |
| abstract_inverted_index.expand | 3 |
| abstract_inverted_index.expert | 29, 51, 71, 156, 178, 189 |
| abstract_inverted_index.gating | 80 |
| abstract_inverted_index.logits | 18, 81 |
| abstract_inverted_index.losses | 126 |
| abstract_inverted_index.models | 190 |
| abstract_inverted_index.reduce | 45 |
| abstract_inverted_index.router | 17 |
| abstract_inverted_index.sparse | 65, 188 |
| abstract_inverted_index.weaken | 50 |
| abstract_inverted_index.yields | 175 |
| abstract_inverted_index.Score", | 85 |
| abstract_inverted_index.address | 58 |
| abstract_inverted_index.between | 16, 91 |
| abstract_inverted_index.defined | 86 |
| abstract_inverted_index.enables | 197 |
| abstract_inverted_index.experts | 9 |
| abstract_inverted_index.flatter | 155 |
| abstract_inverted_index.issues, | 60 |
| abstract_inverted_index.learned | 74, 79 |
| abstract_inverted_index.losses, | 43 |
| abstract_inverted_index.propose | 62 |
| abstract_inverted_index.removes | 120 |
| abstract_inverted_index.routing | 27, 100, 194 |
| abstract_inverted_index.spaces, | 108 |
| abstract_inverted_index.variant | 163 |
| abstract_inverted_index.without | 115 |
| abstract_inverted_index.ImageNet | 142 |
| abstract_inverted_index.Standard | 37 |
| abstract_inverted_index.accuracy | 140, 169 |
| abstract_inverted_index.achieves | 138 |
| abstract_inverted_index.capacity | 5 |
| abstract_inverted_index.directly | 104, 192 |
| abstract_inverted_index.expert's | 21, 96 |
| abstract_inverted_index.experts' | 106 |
| abstract_inverted_index.explicit | 124 |
| abstract_inverted_index.features | 93 |
| abstract_inverted_index.improved | 198 |
| abstract_inverted_index.improves | 165 |
| abstract_inverted_index.internal | 22 |
| abstract_inverted_index.replaces | 78 |
| abstract_inverted_index.sparsely | 7 |
| abstract_inverted_index.unstable | 26 |
| abstract_inverted_index.Moreover, | 159 |
| abstract_inverted_index.addresses | 193 |
| abstract_inverted_index.auxiliary | 41 |
| abstract_inverted_index.balancing | 125 |
| abstract_inverted_index.gradients | 131 |
| abstract_inverted_index.naturally | 153 |
| abstract_inverted_index.principle | 186 |
| abstract_inverted_index.producing | 154 |
| abstract_inverted_index.promoting | 112 |
| abstract_inverted_index.retrieval | 147 |
| abstract_inverted_index.scalable, | 201 |
| abstract_inverted_index.sparsity. | 117 |
| abstract_inverted_index.straggler | 35 |
| abstract_inverted_index.structure | 23 |
| abstract_inverted_index.(ERMoE-ba) | 164 |
| abstract_inverted_index.Crucially, | 118 |
| abstract_inverted_index.activating | 8 |
| abstract_inverted_index.benchmarks | 148 |
| abstract_inverted_index.downstream | 55 |
| abstract_inverted_index.eigenbasis | 76 |
| abstract_inverted_index.image-text | 146 |
| abstract_inverted_index.imbalances | 33 |
| abstract_inverted_index.introduce. | 133 |
| abstract_inverted_index.introduces | 182 |
| abstract_inverted_index.prediction | 168 |
| abstract_inverted_index.similarity | 90 |
| abstract_inverted_index.solutions, | 38 |
| abstract_inverted_index."Eigenbasis | 84 |
| abstract_inverted_index.Flickr30K), | 151 |
| abstract_inverted_index.assignments | 103 |
| abstract_inverted_index.challenges: | 14 |
| abstract_inverted_index.cross-modal | 145 |
| abstract_inverted_index.disparities | 47 |
| abstract_inverted_index.interfering | 130 |
| abstract_inverted_index.orthonormal | 75 |
| abstract_inverted_index.performance | 199 |
| abstract_inverted_index.sacrificing | 116 |
| abstract_inverted_index.stabilizing | 109 |
| abstract_inverted_index.transformer | 67 |
| abstract_inverted_index.utilization | 110 |
| abstract_inverted_index.anatomically | 176 |
| abstract_inverted_index.bottlenecks. | 36 |
| abstract_inverted_index.misalignment | 15 |
| abstract_inverted_index.performance. | 56 |
| abstract_inverted_index.architectural | 185 |
| abstract_inverted_index.architectures | 2 |
| abstract_inverted_index.content-aware | 99 |
| abstract_inverted_index.instabilities | 195 |
| abstract_inverted_index.interpretable | 113, 177, 202 |
| abstract_inverted_index.classification | 143 |
| abstract_inverted_index.distributions. | 158 |
| abstract_inverted_index.load-balancing | 42 |
| abstract_inverted_index.representation | 107 |
| abstract_inverted_index.specialization | 52, 114 |
| abstract_inverted_index.reparameterizes | 69 |
| abstract_inverted_index.specialization. | 203 |
| abstract_inverted_index.specializations. | 179 |
| abstract_inverted_index.state-of-the-art | 139 |
| abstract_inverted_index.underutilization, | 30 |
| abstract_inverted_index.Mixture-of-Experts | 0 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 10 |
| citation_normalized_percentile |