Unbiased Gradient Estimation with Balanced Assignments for Mixtures of\n Experts Article Swipe
YOU?
·
· 2021
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2109.11817
Training large-scale mixture of experts models efficiently on modern hardware\nrequires assigning datapoints in a batch to different experts, each with a\nlimited capacity. Recently proposed assignment procedures lack a probabilistic\ninterpretation and use biased estimators for training. As an alternative, we\npropose two unbiased estimators based on principled stochastic assignment\nprocedures: one that skips datapoints which exceed expert capacity, and one\nthat samples perfectly balanced assignments using an extension of the\nGumbel-Matching distribution [29]. Both estimators are unbiased, as they\ncorrect for the used sampling procedure. On a toy experiment, we find the\n`skip'-estimator is more effective than the balanced sampling one, and both are\nmore robust in solving the task than biased alternatives.\n
Related Topics
- Type
- preprint
- Landing Page
- http://arxiv.org/abs/2109.11817
- https://arxiv.org/pdf/2109.11817
- OA Status
- green
- Cited By
- 2
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4321319232
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4321319232Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2109.11817Digital Object Identifier
- Title
-
Unbiased Gradient Estimation with Balanced Assignments for Mixtures of\n ExpertsWork title
- Type
-
preprintOpenAlex work type
- Publication year
-
2021Year of publication
- Publication date
-
2021-09-24Full publication date if available
- Authors
-
Wouter Kool, Chris J. Maddison, Andriy MnihList of authors in order
- Landing page
-
https://arxiv.org/abs/2109.11817Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2109.11817Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2109.11817Direct OA link when available
- Concepts
-
Estimator, Computer science, Gumbel distribution, Matching (statistics), Sampling (signal processing), Task (project management), Probabilistic logic, Artificial intelligence, Mathematics, Statistics, Engineering, Extreme value theory, Systems engineering, Computer vision, Filter (signal processing)Top concepts (fields/topics) attached by OpenAlex
- Cited by
-
2Total citation count in OpenAlex
- Citations by year (recent)
-
2023: 1, 2022: 1Per-year citation counts (last 5 years)
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4321319232 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2109.11817 |
| ids.openalex | https://openalex.org/W4321319232 |
| fwci | 0.28220715 |
| type | preprint |
| title | Unbiased Gradient Estimation with Balanced Assignments for Mixtures of\n Experts |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T12072 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9995999932289124 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1702 |
| topics[0].subfield.display_name | Artificial Intelligence |
| topics[0].display_name | Machine Learning and Algorithms |
| topics[1].id | https://openalex.org/T12101 |
| topics[1].field.id | https://openalex.org/fields/18 |
| topics[1].field.display_name | Decision Sciences |
| topics[1].score | 0.9983999729156494 |
| topics[1].domain.id | https://openalex.org/domains/2 |
| topics[1].domain.display_name | Social Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1803 |
| topics[1].subfield.display_name | Management Science and Operations Research |
| topics[1].display_name | Advanced Bandit Algorithms Research |
| topics[2].id | https://openalex.org/T12535 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.9973000288009644 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1702 |
| topics[2].subfield.display_name | Artificial Intelligence |
| topics[2].display_name | Machine Learning and Data Classification |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C185429906 |
| concepts[0].level | 2 |
| concepts[0].score | 0.7770322561264038 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q1130160 |
| concepts[0].display_name | Estimator |
| concepts[1].id | https://openalex.org/C41008148 |
| concepts[1].level | 0 |
| concepts[1].score | 0.6537231802940369 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[1].display_name | Computer science |
| concepts[2].id | https://openalex.org/C137610916 |
| concepts[2].level | 3 |
| concepts[2].score | 0.6052733063697815 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q1096862 |
| concepts[2].display_name | Gumbel distribution |
| concepts[3].id | https://openalex.org/C165064840 |
| concepts[3].level | 2 |
| concepts[3].score | 0.6027845144271851 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q1321061 |
| concepts[3].display_name | Matching (statistics) |
| concepts[4].id | https://openalex.org/C140779682 |
| concepts[4].level | 3 |
| concepts[4].score | 0.5628892183303833 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q210868 |
| concepts[4].display_name | Sampling (signal processing) |
| concepts[5].id | https://openalex.org/C2780451532 |
| concepts[5].level | 2 |
| concepts[5].score | 0.5563127398490906 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q759676 |
| concepts[5].display_name | Task (project management) |
| concepts[6].id | https://openalex.org/C49937458 |
| concepts[6].level | 2 |
| concepts[6].score | 0.46718600392341614 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q2599292 |
| concepts[6].display_name | Probabilistic logic |
| concepts[7].id | https://openalex.org/C154945302 |
| concepts[7].level | 1 |
| concepts[7].score | 0.3187313377857208 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[7].display_name | Artificial intelligence |
| concepts[8].id | https://openalex.org/C33923547 |
| concepts[8].level | 0 |
| concepts[8].score | 0.23093977570533752 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q395 |
| concepts[8].display_name | Mathematics |
| concepts[9].id | https://openalex.org/C105795698 |
| concepts[9].level | 1 |
| concepts[9].score | 0.2290884256362915 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q12483 |
| concepts[9].display_name | Statistics |
| concepts[10].id | https://openalex.org/C127413603 |
| concepts[10].level | 0 |
| concepts[10].score | 0.0775367021560669 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q11023 |
| concepts[10].display_name | Engineering |
| concepts[11].id | https://openalex.org/C147581598 |
| concepts[11].level | 2 |
| concepts[11].score | 0.06736770272254944 |
| concepts[11].wikidata | https://www.wikidata.org/wiki/Q729429 |
| concepts[11].display_name | Extreme value theory |
| concepts[12].id | https://openalex.org/C201995342 |
| concepts[12].level | 1 |
| concepts[12].score | 0.0 |
| concepts[12].wikidata | https://www.wikidata.org/wiki/Q682496 |
| concepts[12].display_name | Systems engineering |
| concepts[13].id | https://openalex.org/C31972630 |
| concepts[13].level | 1 |
| concepts[13].score | 0.0 |
| concepts[13].wikidata | https://www.wikidata.org/wiki/Q844240 |
| concepts[13].display_name | Computer vision |
| concepts[14].id | https://openalex.org/C106131492 |
| concepts[14].level | 2 |
| concepts[14].score | 0.0 |
| concepts[14].wikidata | https://www.wikidata.org/wiki/Q3072260 |
| concepts[14].display_name | Filter (signal processing) |
| keywords[0].id | https://openalex.org/keywords/estimator |
| keywords[0].score | 0.7770322561264038 |
| keywords[0].display_name | Estimator |
| keywords[1].id | https://openalex.org/keywords/computer-science |
| keywords[1].score | 0.6537231802940369 |
| keywords[1].display_name | Computer science |
| keywords[2].id | https://openalex.org/keywords/gumbel-distribution |
| keywords[2].score | 0.6052733063697815 |
| keywords[2].display_name | Gumbel distribution |
| keywords[3].id | https://openalex.org/keywords/matching |
| keywords[3].score | 0.6027845144271851 |
| keywords[3].display_name | Matching (statistics) |
| keywords[4].id | https://openalex.org/keywords/sampling |
| keywords[4].score | 0.5628892183303833 |
| keywords[4].display_name | Sampling (signal processing) |
| keywords[5].id | https://openalex.org/keywords/task |
| keywords[5].score | 0.5563127398490906 |
| keywords[5].display_name | Task (project management) |
| keywords[6].id | https://openalex.org/keywords/probabilistic-logic |
| keywords[6].score | 0.46718600392341614 |
| keywords[6].display_name | Probabilistic logic |
| keywords[7].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[7].score | 0.3187313377857208 |
| keywords[7].display_name | Artificial intelligence |
| keywords[8].id | https://openalex.org/keywords/mathematics |
| keywords[8].score | 0.23093977570533752 |
| keywords[8].display_name | Mathematics |
| keywords[9].id | https://openalex.org/keywords/statistics |
| keywords[9].score | 0.2290884256362915 |
| keywords[9].display_name | Statistics |
| keywords[10].id | https://openalex.org/keywords/engineering |
| keywords[10].score | 0.0775367021560669 |
| keywords[10].display_name | Engineering |
| keywords[11].id | https://openalex.org/keywords/extreme-value-theory |
| keywords[11].score | 0.06736770272254944 |
| keywords[11].display_name | Extreme value theory |
| language | |
| locations[0].id | pmh:oai:arXiv.org:2109.11817 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2109.11817 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2109.11817 |
| indexed_in | arxiv |
| authorships[0].author.id | https://openalex.org/A5103284134 |
| authorships[0].author.orcid | https://orcid.org/0000-0002-1837-1454 |
| authorships[0].author.display_name | Wouter Kool |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Kool, Wouter |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5054711904 |
| authorships[1].author.orcid | |
| authorships[1].author.display_name | Chris J. Maddison |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Maddison, Chris J. |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5079370148 |
| authorships[2].author.orcid | |
| authorships[2].author.display_name | Andriy Mnih |
| authorships[2].author_position | last |
| authorships[2].raw_author_name | Mnih, Andriy |
| authorships[2].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2109.11817 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2023-02-19T00:00:00 |
| display_name | Unbiased Gradient Estimation with Balanced Assignments for Mixtures of\n Experts |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T03:46:38.306776 |
| primary_topic.id | https://openalex.org/T12072 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9995999932289124 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1702 |
| primary_topic.subfield.display_name | Artificial Intelligence |
| primary_topic.display_name | Machine Learning and Algorithms |
| related_works | https://openalex.org/W2115040659, https://openalex.org/W2392757156, https://openalex.org/W3121924949, https://openalex.org/W2951988075, https://openalex.org/W2270643620, https://openalex.org/W1570428685, https://openalex.org/W2083778309, https://openalex.org/W4225649502, https://openalex.org/W1498453022, https://openalex.org/W1987537206 |
| cited_by_count | 2 |
| counts_by_year[0].year | 2023 |
| counts_by_year[0].cited_by_count | 1 |
| counts_by_year[1].year | 2022 |
| counts_by_year[1].cited_by_count | 1 |
| locations_count | 1 |
| best_oa_location.id | pmh:oai:arXiv.org:2109.11817 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2109.11817 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2109.11817 |
| primary_location.id | pmh:oai:arXiv.org:2109.11817 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2109.11817 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2109.11817 |
| publication_date | 2021-09-24 |
| publication_year | 2021 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 13, 27, 80 |
| abstract_inverted_index.As | 35 |
| abstract_inverted_index.On | 79 |
| abstract_inverted_index.an | 36, 62 |
| abstract_inverted_index.as | 72 |
| abstract_inverted_index.in | 12, 98 |
| abstract_inverted_index.is | 86 |
| abstract_inverted_index.of | 3, 64 |
| abstract_inverted_index.on | 7, 43 |
| abstract_inverted_index.to | 15 |
| abstract_inverted_index.we | 83 |
| abstract_inverted_index.and | 29, 55, 94 |
| abstract_inverted_index.are | 70 |
| abstract_inverted_index.for | 33, 74 |
| abstract_inverted_index.one | 47 |
| abstract_inverted_index.the | 75, 90, 100 |
| abstract_inverted_index.toy | 81 |
| abstract_inverted_index.two | 39 |
| abstract_inverted_index.use | 30 |
| abstract_inverted_index.Both | 68 |
| abstract_inverted_index.both | 95 |
| abstract_inverted_index.each | 18 |
| abstract_inverted_index.find | 84 |
| abstract_inverted_index.lack | 26 |
| abstract_inverted_index.more | 87 |
| abstract_inverted_index.one, | 93 |
| abstract_inverted_index.task | 101 |
| abstract_inverted_index.than | 89, 102 |
| abstract_inverted_index.that | 48 |
| abstract_inverted_index.used | 76 |
| abstract_inverted_index.with | 19 |
| abstract_inverted_index.[29]. | 67 |
| abstract_inverted_index.based | 42 |
| abstract_inverted_index.batch | 14 |
| abstract_inverted_index.skips | 49 |
| abstract_inverted_index.using | 61 |
| abstract_inverted_index.which | 51 |
| abstract_inverted_index.biased | 31, 103 |
| abstract_inverted_index.exceed | 52 |
| abstract_inverted_index.expert | 53 |
| abstract_inverted_index.models | 5 |
| abstract_inverted_index.modern | 8 |
| abstract_inverted_index.robust | 97 |
| abstract_inverted_index.experts | 4 |
| abstract_inverted_index.mixture | 2 |
| abstract_inverted_index.samples | 57 |
| abstract_inverted_index.solving | 99 |
| abstract_inverted_index.Recently | 22 |
| abstract_inverted_index.Training | 0 |
| abstract_inverted_index.balanced | 59, 91 |
| abstract_inverted_index.experts, | 17 |
| abstract_inverted_index.proposed | 23 |
| abstract_inverted_index.sampling | 77, 92 |
| abstract_inverted_index.unbiased | 40 |
| abstract_inverted_index.are\nmore | 96 |
| abstract_inverted_index.assigning | 10 |
| abstract_inverted_index.capacity, | 54 |
| abstract_inverted_index.capacity. | 21 |
| abstract_inverted_index.different | 16 |
| abstract_inverted_index.effective | 88 |
| abstract_inverted_index.extension | 63 |
| abstract_inverted_index.one\nthat | 56 |
| abstract_inverted_index.perfectly | 58 |
| abstract_inverted_index.training. | 34 |
| abstract_inverted_index.unbiased, | 71 |
| abstract_inverted_index.a\nlimited | 20 |
| abstract_inverted_index.assignment | 24 |
| abstract_inverted_index.datapoints | 11, 50 |
| abstract_inverted_index.estimators | 32, 41, 69 |
| abstract_inverted_index.principled | 44 |
| abstract_inverted_index.procedure. | 78 |
| abstract_inverted_index.procedures | 25 |
| abstract_inverted_index.stochastic | 45 |
| abstract_inverted_index.assignments | 60 |
| abstract_inverted_index.efficiently | 6 |
| abstract_inverted_index.experiment, | 82 |
| abstract_inverted_index.large-scale | 1 |
| abstract_inverted_index.we\npropose | 38 |
| abstract_inverted_index.alternative, | 37 |
| abstract_inverted_index.distribution | 66 |
| abstract_inverted_index.they\ncorrect | 73 |
| abstract_inverted_index.alternatives.\n | 104 |
| abstract_inverted_index.hardware\nrequires | 9 |
| abstract_inverted_index.the\nGumbel-Matching | 65 |
| abstract_inverted_index.the\n`skip'-estimator | 85 |
| abstract_inverted_index.assignment\nprocedures: | 46 |
| abstract_inverted_index.probabilistic\ninterpretation | 28 |
| cited_by_percentile_year.max | 94 |
| cited_by_percentile_year.min | 89 |
| countries_distinct_count | 0 |
| institutions_distinct_count | 3 |
| citation_normalized_percentile.value | 0.66927477 |
| citation_normalized_percentile.is_in_top_1_percent | False |
| citation_normalized_percentile.is_in_top_10_percent | False |