Duo-LLM: A Framework for Studying Adaptive Computation in Large Language Models Article Swipe
YOU?
·
· 2024
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2410.10846
Large Language Models (LLMs) typically generate outputs token by token using a fixed compute budget, leading to inefficient resource utilization. To address this shortcoming, recent advancements in mixture of expert (MoE) models, speculative decoding, and early exit strategies leverage the insight that computational demands can vary significantly based on the complexity and nature of the input. However, identifying optimal routing patterns for dynamic execution remains an open challenge, limiting the full potential of these adaptive methods. To address this need, we study adaptive computation in LLMs more systematically. We propose a novel framework that integrates smaller auxiliary modules within each Feed-Forward Network layer of the LLM. This design enables dynamic routing of tokens based on task complexity: tokens can be processed by either the small or big modules at each layer, or even bypass certain layers entirely. This allows us to introduce a novel notion of a token's difficulty, defined by its potential to benefit from additional computational resources. Importantly, by employing oracles to identify optimal patterns of adaptive computations, we gain valuable insights into the internal workings of LLMs and the routing processes in a simplified heterogeneous MoE setup. We show that trained routers operate differently from oracles and often yield suboptimal solutions. Notably, activating a large module in just one layer outperforms models that use large modules across all layers, underscoring the gap between practical implementations of routing in MoE models and theoretical optima for adaptive computation.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2410.10846
- https://arxiv.org/pdf/2410.10846
- OA Status
- green
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4403572882
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4403572882Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2410.10846Digital Object Identifier
- Title
-
Duo-LLM: A Framework for Studying Adaptive Computation in Large Language ModelsWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2024Year of publication
- Publication date
-
2024-10-01Full publication date if available
- Authors
-
Keivan Alizadeh, Iman Mirzadeh, Hooman Shahrokhi, Dmitry Belenko, F.W. Sun, Minsik Cho, Mohammad Hossein Sekhavat, Moin Nabi, Mehrdad FarajtabarList of authors in order
- Landing page
-
https://arxiv.org/abs/2410.10846Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2410.10846Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2410.10846Direct OA link when available
- Concepts
-
Computation, Computer science, Programming language, Theoretical computer scienceTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4403572882 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2410.10846 |
| ids.doi | https://doi.org/10.48550/arxiv.2410.10846 |
| ids.openalex | https://openalex.org/W4403572882 |
| fwci | |
| type | preprint |
| title | Duo-LLM: A Framework for Studying Adaptive Computation in Large Language Models |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10028 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9728999733924866 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1702 |
| topics[0].subfield.display_name | Artificial Intelligence |
| topics[0].display_name | Topic Modeling |
| topics[1].id | https://openalex.org/T10181 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9276999831199646 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1702 |
| topics[1].subfield.display_name | Artificial Intelligence |
| topics[1].display_name | Natural Language Processing Techniques |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C45374587 |
| concepts[0].level | 2 |
| concepts[0].score | 0.7299444675445557 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q12525525 |
| concepts[0].display_name | Computation |
| concepts[1].id | https://openalex.org/C41008148 |
| concepts[1].level | 0 |
| concepts[1].score | 0.6983180046081543 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[1].display_name | Computer science |
| concepts[2].id | https://openalex.org/C199360897 |
| concepts[2].level | 1 |
| concepts[2].score | 0.3667985796928406 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q9143 |
| concepts[2].display_name | Programming language |
| concepts[3].id | https://openalex.org/C80444323 |
| concepts[3].level | 1 |
| concepts[3].score | 0.3438342809677124 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q2878974 |
| concepts[3].display_name | Theoretical computer science |
| keywords[0].id | https://openalex.org/keywords/computation |
| keywords[0].score | 0.7299444675445557 |
| keywords[0].display_name | Computation |
| keywords[1].id | https://openalex.org/keywords/computer-science |
| keywords[1].score | 0.6983180046081543 |
| keywords[1].display_name | Computer science |
| keywords[2].id | https://openalex.org/keywords/programming-language |
| keywords[2].score | 0.3667985796928406 |
| keywords[2].display_name | Programming language |
| keywords[3].id | https://openalex.org/keywords/theoretical-computer-science |
| keywords[3].score | 0.3438342809677124 |
| keywords[3].display_name | Theoretical computer science |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2410.10846 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2410.10846 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2410.10846 |
| locations[1].id | doi:10.48550/arxiv.2410.10846 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2410.10846 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5030482460 |
| authorships[0].author.orcid | |
| authorships[0].author.display_name | Keivan Alizadeh |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Alizadeh, Keivan |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5079412282 |
| authorships[1].author.orcid | |
| authorships[1].author.display_name | Iman Mirzadeh |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Mirzadeh, Iman |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5109022949 |
| authorships[2].author.orcid | |
| authorships[2].author.display_name | Hooman Shahrokhi |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Shahrokhi, Hooman |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5093547721 |
| authorships[3].author.orcid | |
| authorships[3].author.display_name | Dmitry Belenko |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Belenko, Dmitry |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5040671844 |
| authorships[4].author.orcid | https://orcid.org/0009-0009-4920-7348 |
| authorships[4].author.display_name | F.W. Sun |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Sun, Frank |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A5115076654 |
| authorships[5].author.orcid | https://orcid.org/0000-0003-0481-2682 |
| authorships[5].author.display_name | Minsik Cho |
| authorships[5].author_position | middle |
| authorships[5].raw_author_name | Cho, Minsik |
| authorships[5].is_corresponding | False |
| authorships[6].author.id | https://openalex.org/A5095886473 |
| authorships[6].author.orcid | |
| authorships[6].author.display_name | Mohammad Hossein Sekhavat |
| authorships[6].author_position | middle |
| authorships[6].raw_author_name | Sekhavat, Mohammad Hossein |
| authorships[6].is_corresponding | False |
| authorships[7].author.id | https://openalex.org/A5001459748 |
| authorships[7].author.orcid | https://orcid.org/0000-0001-7559-9888 |
| authorships[7].author.display_name | Moin Nabi |
| authorships[7].author_position | middle |
| authorships[7].raw_author_name | Nabi, Moin |
| authorships[7].is_corresponding | False |
| authorships[8].author.id | https://openalex.org/A5050499655 |
| authorships[8].author.orcid | https://orcid.org/0000-0002-5510-518X |
| authorships[8].author.display_name | Mehrdad Farajtabar |
| authorships[8].author_position | last |
| authorships[8].raw_author_name | Farajtabar, Mehrdad |
| authorships[8].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2410.10846 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2024-10-20T00:00:00 |
| display_name | Duo-LLM: A Framework for Studying Adaptive Computation in Large Language Models |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10028 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9728999733924866 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1702 |
| primary_topic.subfield.display_name | Artificial Intelligence |
| primary_topic.display_name | Topic Modeling |
| related_works | https://openalex.org/W4391375266, https://openalex.org/W2899084033, https://openalex.org/W2748952813, https://openalex.org/W2390279801, https://openalex.org/W4391913857, https://openalex.org/W2358668433, https://openalex.org/W4396701345, https://openalex.org/W2376932109, https://openalex.org/W2001405890, https://openalex.org/W4396696052 |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2410.10846 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2410.10846 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2410.10846 |
| primary_location.id | pmh:oai:arXiv.org:2410.10846 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2410.10846 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2410.10846 |
| publication_date | 2024-10-01 |
| publication_year | 2024 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 11, 90, 142, 146, 185, 206 |
| abstract_inverted_index.To | 20, 76 |
| abstract_inverted_index.We | 88, 190 |
| abstract_inverted_index.an | 65 |
| abstract_inverted_index.at | 128 |
| abstract_inverted_index.be | 119 |
| abstract_inverted_index.by | 8, 121, 150, 160 |
| abstract_inverted_index.in | 26, 84, 184, 209, 230 |
| abstract_inverted_index.of | 28, 53, 72, 103, 111, 145, 167, 178, 228 |
| abstract_inverted_index.on | 48, 114 |
| abstract_inverted_index.or | 125, 131 |
| abstract_inverted_index.to | 16, 140, 153, 163 |
| abstract_inverted_index.us | 139 |
| abstract_inverted_index.we | 80, 170 |
| abstract_inverted_index.MoE | 188, 231 |
| abstract_inverted_index.all | 220 |
| abstract_inverted_index.and | 34, 51, 180, 199, 233 |
| abstract_inverted_index.big | 126 |
| abstract_inverted_index.can | 44, 118 |
| abstract_inverted_index.for | 61, 236 |
| abstract_inverted_index.gap | 224 |
| abstract_inverted_index.its | 151 |
| abstract_inverted_index.one | 211 |
| abstract_inverted_index.the | 39, 49, 54, 69, 104, 123, 175, 181, 223 |
| abstract_inverted_index.use | 216 |
| abstract_inverted_index.LLM. | 105 |
| abstract_inverted_index.LLMs | 85, 179 |
| abstract_inverted_index.This | 106, 137 |
| abstract_inverted_index.each | 99, 129 |
| abstract_inverted_index.even | 132 |
| abstract_inverted_index.exit | 36 |
| abstract_inverted_index.from | 155, 197 |
| abstract_inverted_index.full | 70 |
| abstract_inverted_index.gain | 171 |
| abstract_inverted_index.into | 174 |
| abstract_inverted_index.just | 210 |
| abstract_inverted_index.more | 86 |
| abstract_inverted_index.open | 66 |
| abstract_inverted_index.show | 191 |
| abstract_inverted_index.task | 115 |
| abstract_inverted_index.that | 41, 93, 192, 215 |
| abstract_inverted_index.this | 22, 78 |
| abstract_inverted_index.vary | 45 |
| abstract_inverted_index.(MoE) | 30 |
| abstract_inverted_index.Large | 0 |
| abstract_inverted_index.based | 47, 113 |
| abstract_inverted_index.early | 35 |
| abstract_inverted_index.fixed | 12 |
| abstract_inverted_index.large | 207, 217 |
| abstract_inverted_index.layer | 102, 212 |
| abstract_inverted_index.need, | 79 |
| abstract_inverted_index.novel | 91, 143 |
| abstract_inverted_index.often | 200 |
| abstract_inverted_index.small | 124 |
| abstract_inverted_index.study | 81 |
| abstract_inverted_index.these | 73 |
| abstract_inverted_index.token | 7, 9 |
| abstract_inverted_index.using | 10 |
| abstract_inverted_index.yield | 201 |
| abstract_inverted_index.(LLMs) | 3 |
| abstract_inverted_index.Models | 2 |
| abstract_inverted_index.across | 219 |
| abstract_inverted_index.allows | 138 |
| abstract_inverted_index.bypass | 133 |
| abstract_inverted_index.design | 107 |
| abstract_inverted_index.either | 122 |
| abstract_inverted_index.expert | 29 |
| abstract_inverted_index.input. | 55 |
| abstract_inverted_index.layer, | 130 |
| abstract_inverted_index.layers | 135 |
| abstract_inverted_index.models | 214, 232 |
| abstract_inverted_index.module | 208 |
| abstract_inverted_index.nature | 52 |
| abstract_inverted_index.notion | 144 |
| abstract_inverted_index.optima | 235 |
| abstract_inverted_index.recent | 24 |
| abstract_inverted_index.setup. | 189 |
| abstract_inverted_index.tokens | 112, 117 |
| abstract_inverted_index.within | 98 |
| abstract_inverted_index.Network | 101 |
| abstract_inverted_index.address | 21, 77 |
| abstract_inverted_index.benefit | 154 |
| abstract_inverted_index.between | 225 |
| abstract_inverted_index.budget, | 14 |
| abstract_inverted_index.certain | 134 |
| abstract_inverted_index.compute | 13 |
| abstract_inverted_index.defined | 149 |
| abstract_inverted_index.demands | 43 |
| abstract_inverted_index.dynamic | 62, 109 |
| abstract_inverted_index.enables | 108 |
| abstract_inverted_index.insight | 40 |
| abstract_inverted_index.layers, | 221 |
| abstract_inverted_index.leading | 15 |
| abstract_inverted_index.mixture | 27 |
| abstract_inverted_index.models, | 31 |
| abstract_inverted_index.modules | 97, 127, 218 |
| abstract_inverted_index.operate | 195 |
| abstract_inverted_index.optimal | 58, 165 |
| abstract_inverted_index.oracles | 162, 198 |
| abstract_inverted_index.outputs | 6 |
| abstract_inverted_index.propose | 89 |
| abstract_inverted_index.remains | 64 |
| abstract_inverted_index.routers | 194 |
| abstract_inverted_index.routing | 59, 110, 182, 229 |
| abstract_inverted_index.smaller | 95 |
| abstract_inverted_index.token's | 147 |
| abstract_inverted_index.trained | 193 |
| abstract_inverted_index.However, | 56 |
| abstract_inverted_index.Language | 1 |
| abstract_inverted_index.Notably, | 204 |
| abstract_inverted_index.adaptive | 74, 82, 168, 237 |
| abstract_inverted_index.generate | 5 |
| abstract_inverted_index.identify | 164 |
| abstract_inverted_index.insights | 173 |
| abstract_inverted_index.internal | 176 |
| abstract_inverted_index.leverage | 38 |
| abstract_inverted_index.limiting | 68 |
| abstract_inverted_index.methods. | 75 |
| abstract_inverted_index.patterns | 60, 166 |
| abstract_inverted_index.resource | 18 |
| abstract_inverted_index.valuable | 172 |
| abstract_inverted_index.workings | 177 |
| abstract_inverted_index.auxiliary | 96 |
| abstract_inverted_index.decoding, | 33 |
| abstract_inverted_index.employing | 161 |
| abstract_inverted_index.entirely. | 136 |
| abstract_inverted_index.execution | 63 |
| abstract_inverted_index.framework | 92 |
| abstract_inverted_index.introduce | 141 |
| abstract_inverted_index.potential | 71, 152 |
| abstract_inverted_index.practical | 226 |
| abstract_inverted_index.processed | 120 |
| abstract_inverted_index.processes | 183 |
| abstract_inverted_index.typically | 4 |
| abstract_inverted_index.activating | 205 |
| abstract_inverted_index.additional | 156 |
| abstract_inverted_index.challenge, | 67 |
| abstract_inverted_index.complexity | 50 |
| abstract_inverted_index.integrates | 94 |
| abstract_inverted_index.resources. | 158 |
| abstract_inverted_index.simplified | 186 |
| abstract_inverted_index.solutions. | 203 |
| abstract_inverted_index.strategies | 37 |
| abstract_inverted_index.suboptimal | 202 |
| abstract_inverted_index.complexity: | 116 |
| abstract_inverted_index.computation | 83 |
| abstract_inverted_index.differently | 196 |
| abstract_inverted_index.difficulty, | 148 |
| abstract_inverted_index.identifying | 57 |
| abstract_inverted_index.inefficient | 17 |
| abstract_inverted_index.outperforms | 213 |
| abstract_inverted_index.speculative | 32 |
| abstract_inverted_index.theoretical | 234 |
| abstract_inverted_index.Feed-Forward | 100 |
| abstract_inverted_index.Importantly, | 159 |
| abstract_inverted_index.advancements | 25 |
| abstract_inverted_index.computation. | 238 |
| abstract_inverted_index.shortcoming, | 23 |
| abstract_inverted_index.underscoring | 222 |
| abstract_inverted_index.utilization. | 19 |
| abstract_inverted_index.computational | 42, 157 |
| abstract_inverted_index.computations, | 169 |
| abstract_inverted_index.heterogeneous | 187 |
| abstract_inverted_index.significantly | 46 |
| abstract_inverted_index.implementations | 227 |
| abstract_inverted_index.systematically. | 87 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 9 |
| citation_normalized_percentile |