A General Theory for Softmax Gating Multinomial Logistic Mixture of Experts Article Swipe
YOU?
·
· 2023
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2310.14188
Mixture-of-experts (MoE) model incorporates the power of multiple submodels via gating functions to achieve greater performance in numerous regression and classification applications. From a theoretical perspective, while there have been previous attempts to comprehend the behavior of that model under the regression settings through the convergence analysis of maximum likelihood estimation in the Gaussian MoE model, such analysis under the setting of a classification problem has remained missing in the literature. We close this gap by establishing the convergence rates of density estimation and parameter estimation in the softmax gating multinomial logistic MoE model. Notably, when part of the expert parameters vanish, these rates are shown to be slower than polynomial rates owing to an inherent interaction between the softmax gating and expert functions via partial differential equations. To address this issue, we propose using a novel class of modified softmax gating functions which transform the input before delivering them to the gating functions. As a result, the previous interaction disappears and the parameter estimation rates are significantly improved.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2310.14188
- https://arxiv.org/pdf/2310.14188
- OA Status
- green
- Cited By
- 2
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4387928887
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4387928887Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2310.14188Digital Object Identifier
- Title
-
A General Theory for Softmax Gating Multinomial Logistic Mixture of ExpertsWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2023Year of publication
- Publication date
-
2023-10-22Full publication date if available
- Authors
-
Huy Hung Nguyen, Pedram Akbarian, TrungTin Nguyen, Nhat HoList of authors in order
- Landing page
-
https://arxiv.org/abs/2310.14188Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2310.14188Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2310.14188Direct OA link when available
- Concepts
-
Softmax function, Gating, Multinomial logistic regression, Multinomial distribution, Sketch, Mixture model, Logistic regression, Mathematics, Computer science, Applied mathematics, Convergence (economics), Artificial intelligence, Econometrics, Statistics, Pattern recognition (psychology), Algorithm, Artificial neural network, Biology, Economics, Economic growth, PhysiologyTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
2Total citation count in OpenAlex
- Citations by year (recent)
-
2024: 2Per-year citation counts (last 5 years)
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4387928887 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2310.14188 |
| ids.doi | https://doi.org/10.48550/arxiv.2310.14188 |
| ids.openalex | https://openalex.org/W4387928887 |
| fwci | |
| type | preprint |
| title | A General Theory for Softmax Gating Multinomial Logistic Mixture of Experts |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T13030 |
| topics[0].field.id | https://openalex.org/fields/26 |
| topics[0].field.display_name | Mathematics |
| topics[0].score | 0.9818000197410583 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/2613 |
| topics[0].subfield.display_name | Statistics and Probability |
| topics[0].display_name | Survey Sampling and Estimation Techniques |
| topics[1].id | https://openalex.org/T11901 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9692999720573425 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1702 |
| topics[1].subfield.display_name | Artificial Intelligence |
| topics[1].display_name | Bayesian Methods and Mixture Models |
| topics[2].id | https://openalex.org/T10410 |
| topics[2].field.id | https://openalex.org/fields/26 |
| topics[2].field.display_name | Mathematics |
| topics[2].score | 0.9661999940872192 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/2611 |
| topics[2].subfield.display_name | Modeling and Simulation |
| topics[2].display_name | COVID-19 epidemiological studies |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C188441871 |
| concepts[0].level | 3 |
| concepts[0].score | 0.9772422313690186 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q7554146 |
| concepts[0].display_name | Softmax function |
| concepts[1].id | https://openalex.org/C194544171 |
| concepts[1].level | 2 |
| concepts[1].score | 0.8363485336303711 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q21105679 |
| concepts[1].display_name | Gating |
| concepts[2].id | https://openalex.org/C117568660 |
| concepts[2].level | 2 |
| concepts[2].score | 0.7750283479690552 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q1650843 |
| concepts[2].display_name | Multinomial logistic regression |
| concepts[3].id | https://openalex.org/C192065140 |
| concepts[3].level | 2 |
| concepts[3].score | 0.7210543155670166 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q1147928 |
| concepts[3].display_name | Multinomial distribution |
| concepts[4].id | https://openalex.org/C2779231336 |
| concepts[4].level | 2 |
| concepts[4].score | 0.6031673550605774 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q7534724 |
| concepts[4].display_name | Sketch |
| concepts[5].id | https://openalex.org/C61224824 |
| concepts[5].level | 2 |
| concepts[5].score | 0.47400057315826416 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q2260434 |
| concepts[5].display_name | Mixture model |
| concepts[6].id | https://openalex.org/C151956035 |
| concepts[6].level | 2 |
| concepts[6].score | 0.4699421525001526 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q1132755 |
| concepts[6].display_name | Logistic regression |
| concepts[7].id | https://openalex.org/C33923547 |
| concepts[7].level | 0 |
| concepts[7].score | 0.4692751467227936 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q395 |
| concepts[7].display_name | Mathematics |
| concepts[8].id | https://openalex.org/C41008148 |
| concepts[8].level | 0 |
| concepts[8].score | 0.4277491867542267 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[8].display_name | Computer science |
| concepts[9].id | https://openalex.org/C28826006 |
| concepts[9].level | 1 |
| concepts[9].score | 0.4211881756782532 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q33521 |
| concepts[9].display_name | Applied mathematics |
| concepts[10].id | https://openalex.org/C2777303404 |
| concepts[10].level | 2 |
| concepts[10].score | 0.41203564405441284 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q759757 |
| concepts[10].display_name | Convergence (economics) |
| concepts[11].id | https://openalex.org/C154945302 |
| concepts[11].level | 1 |
| concepts[11].score | 0.41201600432395935 |
| concepts[11].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[11].display_name | Artificial intelligence |
| concepts[12].id | https://openalex.org/C149782125 |
| concepts[12].level | 1 |
| concepts[12].score | 0.3631240129470825 |
| concepts[12].wikidata | https://www.wikidata.org/wiki/Q160039 |
| concepts[12].display_name | Econometrics |
| concepts[13].id | https://openalex.org/C105795698 |
| concepts[13].level | 1 |
| concepts[13].score | 0.3432292938232422 |
| concepts[13].wikidata | https://www.wikidata.org/wiki/Q12483 |
| concepts[13].display_name | Statistics |
| concepts[14].id | https://openalex.org/C153180895 |
| concepts[14].level | 2 |
| concepts[14].score | 0.3315144181251526 |
| concepts[14].wikidata | https://www.wikidata.org/wiki/Q7148389 |
| concepts[14].display_name | Pattern recognition (psychology) |
| concepts[15].id | https://openalex.org/C11413529 |
| concepts[15].level | 1 |
| concepts[15].score | 0.311140239238739 |
| concepts[15].wikidata | https://www.wikidata.org/wiki/Q8366 |
| concepts[15].display_name | Algorithm |
| concepts[16].id | https://openalex.org/C50644808 |
| concepts[16].level | 2 |
| concepts[16].score | 0.14648672938346863 |
| concepts[16].wikidata | https://www.wikidata.org/wiki/Q192776 |
| concepts[16].display_name | Artificial neural network |
| concepts[17].id | https://openalex.org/C86803240 |
| concepts[17].level | 0 |
| concepts[17].score | 0.0 |
| concepts[17].wikidata | https://www.wikidata.org/wiki/Q420 |
| concepts[17].display_name | Biology |
| concepts[18].id | https://openalex.org/C162324750 |
| concepts[18].level | 0 |
| concepts[18].score | 0.0 |
| concepts[18].wikidata | https://www.wikidata.org/wiki/Q8134 |
| concepts[18].display_name | Economics |
| concepts[19].id | https://openalex.org/C50522688 |
| concepts[19].level | 1 |
| concepts[19].score | 0.0 |
| concepts[19].wikidata | https://www.wikidata.org/wiki/Q189833 |
| concepts[19].display_name | Economic growth |
| concepts[20].id | https://openalex.org/C42407357 |
| concepts[20].level | 1 |
| concepts[20].score | 0.0 |
| concepts[20].wikidata | https://www.wikidata.org/wiki/Q521 |
| concepts[20].display_name | Physiology |
| keywords[0].id | https://openalex.org/keywords/softmax-function |
| keywords[0].score | 0.9772422313690186 |
| keywords[0].display_name | Softmax function |
| keywords[1].id | https://openalex.org/keywords/gating |
| keywords[1].score | 0.8363485336303711 |
| keywords[1].display_name | Gating |
| keywords[2].id | https://openalex.org/keywords/multinomial-logistic-regression |
| keywords[2].score | 0.7750283479690552 |
| keywords[2].display_name | Multinomial logistic regression |
| keywords[3].id | https://openalex.org/keywords/multinomial-distribution |
| keywords[3].score | 0.7210543155670166 |
| keywords[3].display_name | Multinomial distribution |
| keywords[4].id | https://openalex.org/keywords/sketch |
| keywords[4].score | 0.6031673550605774 |
| keywords[4].display_name | Sketch |
| keywords[5].id | https://openalex.org/keywords/mixture-model |
| keywords[5].score | 0.47400057315826416 |
| keywords[5].display_name | Mixture model |
| keywords[6].id | https://openalex.org/keywords/logistic-regression |
| keywords[6].score | 0.4699421525001526 |
| keywords[6].display_name | Logistic regression |
| keywords[7].id | https://openalex.org/keywords/mathematics |
| keywords[7].score | 0.4692751467227936 |
| keywords[7].display_name | Mathematics |
| keywords[8].id | https://openalex.org/keywords/computer-science |
| keywords[8].score | 0.4277491867542267 |
| keywords[8].display_name | Computer science |
| keywords[9].id | https://openalex.org/keywords/applied-mathematics |
| keywords[9].score | 0.4211881756782532 |
| keywords[9].display_name | Applied mathematics |
| keywords[10].id | https://openalex.org/keywords/convergence |
| keywords[10].score | 0.41203564405441284 |
| keywords[10].display_name | Convergence (economics) |
| keywords[11].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[11].score | 0.41201600432395935 |
| keywords[11].display_name | Artificial intelligence |
| keywords[12].id | https://openalex.org/keywords/econometrics |
| keywords[12].score | 0.3631240129470825 |
| keywords[12].display_name | Econometrics |
| keywords[13].id | https://openalex.org/keywords/statistics |
| keywords[13].score | 0.3432292938232422 |
| keywords[13].display_name | Statistics |
| keywords[14].id | https://openalex.org/keywords/pattern-recognition |
| keywords[14].score | 0.3315144181251526 |
| keywords[14].display_name | Pattern recognition (psychology) |
| keywords[15].id | https://openalex.org/keywords/algorithm |
| keywords[15].score | 0.311140239238739 |
| keywords[15].display_name | Algorithm |
| keywords[16].id | https://openalex.org/keywords/artificial-neural-network |
| keywords[16].score | 0.14648672938346863 |
| keywords[16].display_name | Artificial neural network |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2310.14188 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2310.14188 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2310.14188 |
| locations[1].id | pmh:oai:HAL:hal-04256824v1 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306402512 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | False |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | HAL (Le Centre pour la Communication Scientifique Directe) |
| locations[1].source.host_organization | https://openalex.org/I1294671590 |
| locations[1].source.host_organization_name | Centre National de la Recherche Scientifique |
| locations[1].source.host_organization_lineage | https://openalex.org/I1294671590 |
| locations[1].license | cc-by |
| locations[1].pdf_url | |
| locations[1].version | submittedVersion |
| locations[1].raw_type | Preprints, Working Papers, ... |
| locations[1].license_id | https://openalex.org/licenses/cc-by |
| locations[1].is_accepted | False |
| locations[1].is_published | False |
| locations[1].raw_source_name | 2023 |
| locations[1].landing_page_url | https://hal.science/hal-04256824 |
| locations[2].id | doi:10.48550/arxiv.2310.14188 |
| locations[2].is_oa | True |
| locations[2].source.id | https://openalex.org/S4306400194 |
| locations[2].source.issn | |
| locations[2].source.type | repository |
| locations[2].source.is_oa | True |
| locations[2].source.issn_l | |
| locations[2].source.is_core | False |
| locations[2].source.is_in_doaj | False |
| locations[2].source.display_name | arXiv (Cornell University) |
| locations[2].source.host_organization | https://openalex.org/I205783295 |
| locations[2].source.host_organization_name | Cornell University |
| locations[2].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[2].license | |
| locations[2].pdf_url | |
| locations[2].version | |
| locations[2].raw_type | article |
| locations[2].license_id | |
| locations[2].is_accepted | False |
| locations[2].is_published | |
| locations[2].raw_source_name | |
| locations[2].landing_page_url | https://doi.org/10.48550/arxiv.2310.14188 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5038574766 |
| authorships[0].author.orcid | https://orcid.org/0000-0002-4779-8887 |
| authorships[0].author.display_name | Huy Hung Nguyen |
| authorships[0].countries | US |
| authorships[0].affiliations[0].institution_ids | https://openalex.org/I86519309 |
| authorships[0].affiliations[0].raw_affiliation_string | University of Texas at Austin [Austin] |
| authorships[0].institutions[0].id | https://openalex.org/I86519309 |
| authorships[0].institutions[0].ror | https://ror.org/00hj54h04 |
| authorships[0].institutions[0].type | education |
| authorships[0].institutions[0].lineage | https://openalex.org/I86519309 |
| authorships[0].institutions[0].country_code | US |
| authorships[0].institutions[0].display_name | The University of Texas at Austin |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Nguyen, Huy |
| authorships[0].is_corresponding | False |
| authorships[0].raw_affiliation_strings | University of Texas at Austin [Austin] |
| authorships[1].author.id | https://openalex.org/A5092950223 |
| authorships[1].author.orcid | |
| authorships[1].author.display_name | Pedram Akbarian |
| authorships[1].countries | US |
| authorships[1].affiliations[0].institution_ids | https://openalex.org/I86519309 |
| authorships[1].affiliations[0].raw_affiliation_string | University of Texas at Austin [Austin] |
| authorships[1].institutions[0].id | https://openalex.org/I86519309 |
| authorships[1].institutions[0].ror | https://ror.org/00hj54h04 |
| authorships[1].institutions[0].type | education |
| authorships[1].institutions[0].lineage | https://openalex.org/I86519309 |
| authorships[1].institutions[0].country_code | US |
| authorships[1].institutions[0].display_name | The University of Texas at Austin |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Akbarian, Pedram |
| authorships[1].is_corresponding | False |
| authorships[1].raw_affiliation_strings | University of Texas at Austin [Austin] |
| authorships[2].author.id | https://openalex.org/A5039728717 |
| authorships[2].author.orcid | https://orcid.org/0000-0001-8433-5980 |
| authorships[2].author.display_name | TrungTin Nguyen |
| authorships[2].affiliations[0].raw_affiliation_string | Modèles statistiques bayésiens et des valeurs extrêmes pour données structurées et de grande dimension |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Nguyen, TrungTin |
| authorships[2].is_corresponding | False |
| authorships[2].raw_affiliation_strings | Modèles statistiques bayésiens et des valeurs extrêmes pour données structurées et de grande dimension |
| authorships[3].author.id | https://openalex.org/A5112412955 |
| authorships[3].author.orcid | |
| authorships[3].author.display_name | Nhat Ho |
| authorships[3].countries | US |
| authorships[3].affiliations[0].institution_ids | https://openalex.org/I86519309 |
| authorships[3].affiliations[0].raw_affiliation_string | University of Texas at Austin [Austin] |
| authorships[3].institutions[0].id | https://openalex.org/I86519309 |
| authorships[3].institutions[0].ror | https://ror.org/00hj54h04 |
| authorships[3].institutions[0].type | education |
| authorships[3].institutions[0].lineage | https://openalex.org/I86519309 |
| authorships[3].institutions[0].country_code | US |
| authorships[3].institutions[0].display_name | The University of Texas at Austin |
| authorships[3].author_position | last |
| authorships[3].raw_author_name | Ho, Nhat |
| authorships[3].is_corresponding | False |
| authorships[3].raw_affiliation_strings | University of Texas at Austin [Austin] |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2310.14188 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2023-10-25T00:00:00 |
| display_name | A General Theory for Softmax Gating Multinomial Logistic Mixture of Experts |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T13030 |
| primary_topic.field.id | https://openalex.org/fields/26 |
| primary_topic.field.display_name | Mathematics |
| primary_topic.score | 0.9818000197410583 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/2613 |
| primary_topic.subfield.display_name | Statistics and Probability |
| primary_topic.display_name | Survey Sampling and Estimation Techniques |
| related_works | https://openalex.org/W3107204728, https://openalex.org/W4287591324, https://openalex.org/W3108503355, https://openalex.org/W2980176872, https://openalex.org/W4226420367, https://openalex.org/W3098841390, https://openalex.org/W1504770579, https://openalex.org/W2784774275, https://openalex.org/W2184978910, https://openalex.org/W1917858188 |
| cited_by_count | 2 |
| counts_by_year[0].year | 2024 |
| counts_by_year[0].cited_by_count | 2 |
| locations_count | 3 |
| best_oa_location.id | pmh:oai:arXiv.org:2310.14188 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2310.14188 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2310.14188 |
| primary_location.id | pmh:oai:arXiv.org:2310.14188 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2310.14188 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2310.14188 |
| publication_date | 2023-10-22 |
| publication_year | 2023 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 23, 62, 135, 155 |
| abstract_inverted_index.As | 154 |
| abstract_inverted_index.To | 128 |
| abstract_inverted_index.We | 71 |
| abstract_inverted_index.an | 114 |
| abstract_inverted_index.be | 107 |
| abstract_inverted_index.by | 75 |
| abstract_inverted_index.in | 16, 51, 68, 86 |
| abstract_inverted_index.of | 6, 36, 47, 61, 80, 97, 138 |
| abstract_inverted_index.to | 12, 32, 106, 113, 150 |
| abstract_inverted_index.we | 132 |
| abstract_inverted_index.MoE | 54, 92 |
| abstract_inverted_index.and | 19, 83, 121, 161 |
| abstract_inverted_index.are | 104, 166 |
| abstract_inverted_index.gap | 74 |
| abstract_inverted_index.has | 65 |
| abstract_inverted_index.the | 4, 34, 40, 44, 52, 59, 69, 77, 87, 98, 118, 145, 151, 157, 162 |
| abstract_inverted_index.via | 9, 124 |
| abstract_inverted_index.From | 22 |
| abstract_inverted_index.been | 29 |
| abstract_inverted_index.have | 28 |
| abstract_inverted_index.part | 96 |
| abstract_inverted_index.such | 56 |
| abstract_inverted_index.than | 109 |
| abstract_inverted_index.that | 37 |
| abstract_inverted_index.them | 149 |
| abstract_inverted_index.this | 73, 130 |
| abstract_inverted_index.when | 95 |
| abstract_inverted_index.(MoE) | 1 |
| abstract_inverted_index.class | 137 |
| abstract_inverted_index.close | 72 |
| abstract_inverted_index.input | 146 |
| abstract_inverted_index.model | 2, 38 |
| abstract_inverted_index.novel | 136 |
| abstract_inverted_index.owing | 112 |
| abstract_inverted_index.power | 5 |
| abstract_inverted_index.rates | 79, 103, 111, 165 |
| abstract_inverted_index.shown | 105 |
| abstract_inverted_index.there | 27 |
| abstract_inverted_index.these | 102 |
| abstract_inverted_index.under | 39, 58 |
| abstract_inverted_index.using | 134 |
| abstract_inverted_index.which | 143 |
| abstract_inverted_index.while | 26 |
| abstract_inverted_index.before | 147 |
| abstract_inverted_index.expert | 99, 122 |
| abstract_inverted_index.gating | 10, 89, 120, 141, 152 |
| abstract_inverted_index.issue, | 131 |
| abstract_inverted_index.model, | 55 |
| abstract_inverted_index.model. | 93 |
| abstract_inverted_index.slower | 108 |
| abstract_inverted_index.achieve | 13 |
| abstract_inverted_index.address | 129 |
| abstract_inverted_index.between | 117 |
| abstract_inverted_index.density | 81 |
| abstract_inverted_index.greater | 14 |
| abstract_inverted_index.maximum | 48 |
| abstract_inverted_index.missing | 67 |
| abstract_inverted_index.partial | 125 |
| abstract_inverted_index.problem | 64 |
| abstract_inverted_index.propose | 133 |
| abstract_inverted_index.result, | 156 |
| abstract_inverted_index.setting | 60 |
| abstract_inverted_index.softmax | 88, 119, 140 |
| abstract_inverted_index.through | 43 |
| abstract_inverted_index.vanish, | 101 |
| abstract_inverted_index.Gaussian | 53 |
| abstract_inverted_index.Notably, | 94 |
| abstract_inverted_index.analysis | 46, 57 |
| abstract_inverted_index.attempts | 31 |
| abstract_inverted_index.behavior | 35 |
| abstract_inverted_index.inherent | 115 |
| abstract_inverted_index.logistic | 91 |
| abstract_inverted_index.modified | 139 |
| abstract_inverted_index.multiple | 7 |
| abstract_inverted_index.numerous | 17 |
| abstract_inverted_index.previous | 30, 158 |
| abstract_inverted_index.remained | 66 |
| abstract_inverted_index.settings | 42 |
| abstract_inverted_index.functions | 11, 123, 142 |
| abstract_inverted_index.improved. | 168 |
| abstract_inverted_index.parameter | 84, 163 |
| abstract_inverted_index.submodels | 8 |
| abstract_inverted_index.transform | 144 |
| abstract_inverted_index.comprehend | 33 |
| abstract_inverted_index.delivering | 148 |
| abstract_inverted_index.disappears | 160 |
| abstract_inverted_index.equations. | 127 |
| abstract_inverted_index.estimation | 50, 82, 85, 164 |
| abstract_inverted_index.functions. | 153 |
| abstract_inverted_index.likelihood | 49 |
| abstract_inverted_index.parameters | 100 |
| abstract_inverted_index.polynomial | 110 |
| abstract_inverted_index.regression | 18, 41 |
| abstract_inverted_index.convergence | 45, 78 |
| abstract_inverted_index.interaction | 116, 159 |
| abstract_inverted_index.literature. | 70 |
| abstract_inverted_index.multinomial | 90 |
| abstract_inverted_index.performance | 15 |
| abstract_inverted_index.theoretical | 24 |
| abstract_inverted_index.differential | 126 |
| abstract_inverted_index.establishing | 76 |
| abstract_inverted_index.incorporates | 3 |
| abstract_inverted_index.perspective, | 25 |
| abstract_inverted_index.applications. | 21 |
| abstract_inverted_index.significantly | 167 |
| abstract_inverted_index.classification | 20, 63 |
| abstract_inverted_index.Mixture-of-experts | 0 |
| cited_by_percentile_year | |
| countries_distinct_count | 1 |
| institutions_distinct_count | 4 |
| citation_normalized_percentile |