Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model Article Swipe
YOU?
·
· 2024
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2406.19905
The Mixture-of-Experts (MoE) has gained increasing attention in studying Large Vision-Language Models (LVLMs). It uses a sparse model to replace the dense model, achieving comparable performance while activating fewer parameters during inference, thus significantly reducing the inference cost. Existing MoE methods in LVLM encourage different experts to specialize in different tokens, and they usually employ a router to predict the routing of each token. However, the router is not optimized concerning distinct parameter optimization directions generated from tokens within an expert. This may lead to severe interference between tokens within an expert. To address this problem, we propose to use the token-level gradient analysis to Solving Token Gradient Conflict (STGC) in this paper. Specifically, we first use token-level gradients to identify conflicting tokens in experts. After that, we add a regularization loss tailored to encourage conflicting tokens routing from their current experts to other experts, for reducing interference between tokens within an expert. Our method can serve as a plug-in for diverse LVLM methods, and extensive experimental results demonstrate its effectiveness. The code will be publicly available at https://github.com/longrongyang/STGC.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2406.19905
- https://arxiv.org/pdf/2406.19905
- OA Status
- green
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4400221755
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4400221755Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2406.19905Digital Object Identifier
- Title
-
Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language ModelWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2024Year of publication
- Publication date
-
2024-06-28Full publication date if available
- Authors
-
Longrong Yang, Dong Sheng, Chaoxiang Cai, Fan Yang, Size Li, Di Zhang, Xi LiList of authors in order
- Landing page
-
https://arxiv.org/abs/2406.19905Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2406.19905Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2406.19905Direct OA link when available
- Concepts
-
Security token, Computer science, Artificial intelligence, Computer vision, Psychology, Computer securityTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4400221755 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2406.19905 |
| ids.doi | https://doi.org/10.48550/arxiv.2406.19905 |
| ids.openalex | https://openalex.org/W4400221755 |
| fwci | |
| type | preprint |
| title | Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T11714 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9970999956130981 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1707 |
| topics[0].subfield.display_name | Computer Vision and Pattern Recognition |
| topics[0].display_name | Multimodal Machine Learning Applications |
| topics[1].id | https://openalex.org/T11307 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9871000051498413 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1702 |
| topics[1].subfield.display_name | Artificial Intelligence |
| topics[1].display_name | Domain Adaptation and Few-Shot Learning |
| topics[2].id | https://openalex.org/T10627 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.9247999787330627 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1707 |
| topics[2].subfield.display_name | Computer Vision and Pattern Recognition |
| topics[2].display_name | Advanced Image and Video Retrieval Techniques |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C48145219 |
| concepts[0].level | 2 |
| concepts[0].score | 0.7792305946350098 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q1335365 |
| concepts[0].display_name | Security token |
| concepts[1].id | https://openalex.org/C41008148 |
| concepts[1].level | 0 |
| concepts[1].score | 0.5257614254951477 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[1].display_name | Computer science |
| concepts[2].id | https://openalex.org/C154945302 |
| concepts[2].level | 1 |
| concepts[2].score | 0.5038556456565857 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[2].display_name | Artificial intelligence |
| concepts[3].id | https://openalex.org/C31972630 |
| concepts[3].level | 1 |
| concepts[3].score | 0.4225802719593048 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q844240 |
| concepts[3].display_name | Computer vision |
| concepts[4].id | https://openalex.org/C15744967 |
| concepts[4].level | 0 |
| concepts[4].score | 0.3901526927947998 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q9418 |
| concepts[4].display_name | Psychology |
| concepts[5].id | https://openalex.org/C38652104 |
| concepts[5].level | 1 |
| concepts[5].score | 0.11678394675254822 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q3510521 |
| concepts[5].display_name | Computer security |
| keywords[0].id | https://openalex.org/keywords/security-token |
| keywords[0].score | 0.7792305946350098 |
| keywords[0].display_name | Security token |
| keywords[1].id | https://openalex.org/keywords/computer-science |
| keywords[1].score | 0.5257614254951477 |
| keywords[1].display_name | Computer science |
| keywords[2].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[2].score | 0.5038556456565857 |
| keywords[2].display_name | Artificial intelligence |
| keywords[3].id | https://openalex.org/keywords/computer-vision |
| keywords[3].score | 0.4225802719593048 |
| keywords[3].display_name | Computer vision |
| keywords[4].id | https://openalex.org/keywords/psychology |
| keywords[4].score | 0.3901526927947998 |
| keywords[4].display_name | Psychology |
| keywords[5].id | https://openalex.org/keywords/computer-security |
| keywords[5].score | 0.11678394675254822 |
| keywords[5].display_name | Computer security |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2406.19905 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2406.19905 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2406.19905 |
| locations[1].id | doi:10.48550/arxiv.2406.19905 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2406.19905 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5033807873 |
| authorships[0].author.orcid | https://orcid.org/0000-0002-2433-2099 |
| authorships[0].author.display_name | Longrong Yang |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Yang, Longrong |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5111343493 |
| authorships[1].author.orcid | https://orcid.org/0000-0001-5658-5771 |
| authorships[1].author.display_name | Dong Sheng |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Sheng, Dong |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5101675201 |
| authorships[2].author.orcid | https://orcid.org/0000-0003-3322-4273 |
| authorships[2].author.display_name | Chaoxiang Cai |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Cai, Chaoxiang |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5024451201 |
| authorships[3].author.orcid | https://orcid.org/0000-0002-2880-6271 |
| authorships[3].author.display_name | Fan Yang |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Yang, Fan |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5102762699 |
| authorships[4].author.orcid | https://orcid.org/0009-0008-3809-0183 |
| authorships[4].author.display_name | Size Li |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Li, Size |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A5100366435 |
| authorships[5].author.orcid | https://orcid.org/0000-0003-2782-3886 |
| authorships[5].author.display_name | Di Zhang |
| authorships[5].author_position | middle |
| authorships[5].raw_author_name | Zhang, Di |
| authorships[5].is_corresponding | False |
| authorships[6].author.id | https://openalex.org/A5100407758 |
| authorships[6].author.orcid | https://orcid.org/0000-0003-3023-1662 |
| authorships[6].author.display_name | Xi Li |
| authorships[6].author_position | last |
| authorships[6].raw_author_name | Li, Xi |
| authorships[6].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2406.19905 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T11714 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9970999956130981 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1707 |
| primary_topic.subfield.display_name | Computer Vision and Pattern Recognition |
| primary_topic.display_name | Multimodal Machine Learning Applications |
| related_works | https://openalex.org/W4391375266, https://openalex.org/W2748952813, https://openalex.org/W2058170566, https://openalex.org/W2755342338, https://openalex.org/W2772917594, https://openalex.org/W2775347418, https://openalex.org/W2166024367, https://openalex.org/W3116076068, https://openalex.org/W2229312674, https://openalex.org/W2951359407 |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2406.19905 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2406.19905 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2406.19905 |
| primary_location.id | pmh:oai:arXiv.org:2406.19905 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2406.19905 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2406.19905 |
| publication_date | 2024-06-28 |
| publication_year | 2024 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 15, 55, 129, 158 |
| abstract_inverted_index.It | 13 |
| abstract_inverted_index.To | 92 |
| abstract_inverted_index.an | 79, 90, 151 |
| abstract_inverted_index.as | 157 |
| abstract_inverted_index.at | 177 |
| abstract_inverted_index.be | 174 |
| abstract_inverted_index.in | 7, 41, 48, 110, 123 |
| abstract_inverted_index.is | 67 |
| abstract_inverted_index.of | 61 |
| abstract_inverted_index.to | 18, 46, 57, 84, 98, 104, 119, 133, 142 |
| abstract_inverted_index.we | 96, 114, 127 |
| abstract_inverted_index.MoE | 39 |
| abstract_inverted_index.Our | 153 |
| abstract_inverted_index.The | 0, 171 |
| abstract_inverted_index.add | 128 |
| abstract_inverted_index.and | 51, 164 |
| abstract_inverted_index.can | 155 |
| abstract_inverted_index.for | 145, 160 |
| abstract_inverted_index.has | 3 |
| abstract_inverted_index.its | 169 |
| abstract_inverted_index.may | 82 |
| abstract_inverted_index.not | 68 |
| abstract_inverted_index.the | 20, 35, 59, 65, 100 |
| abstract_inverted_index.use | 99, 116 |
| abstract_inverted_index.LVLM | 42, 162 |
| abstract_inverted_index.This | 81 |
| abstract_inverted_index.code | 172 |
| abstract_inverted_index.each | 62 |
| abstract_inverted_index.from | 76, 138 |
| abstract_inverted_index.lead | 83 |
| abstract_inverted_index.loss | 131 |
| abstract_inverted_index.they | 52 |
| abstract_inverted_index.this | 94, 111 |
| abstract_inverted_index.thus | 32 |
| abstract_inverted_index.uses | 14 |
| abstract_inverted_index.will | 173 |
| abstract_inverted_index.(MoE) | 2 |
| abstract_inverted_index.After | 125 |
| abstract_inverted_index.Large | 9 |
| abstract_inverted_index.Token | 106 |
| abstract_inverted_index.cost. | 37 |
| abstract_inverted_index.dense | 21 |
| abstract_inverted_index.fewer | 28 |
| abstract_inverted_index.first | 115 |
| abstract_inverted_index.model | 17 |
| abstract_inverted_index.other | 143 |
| abstract_inverted_index.serve | 156 |
| abstract_inverted_index.that, | 126 |
| abstract_inverted_index.their | 139 |
| abstract_inverted_index.while | 26 |
| abstract_inverted_index.(STGC) | 109 |
| abstract_inverted_index.Models | 11 |
| abstract_inverted_index.during | 30 |
| abstract_inverted_index.employ | 54 |
| abstract_inverted_index.gained | 4 |
| abstract_inverted_index.method | 154 |
| abstract_inverted_index.model, | 22 |
| abstract_inverted_index.paper. | 112 |
| abstract_inverted_index.router | 56, 66 |
| abstract_inverted_index.severe | 85 |
| abstract_inverted_index.sparse | 16 |
| abstract_inverted_index.token. | 63 |
| abstract_inverted_index.tokens | 77, 88, 122, 136, 149 |
| abstract_inverted_index.within | 78, 89, 150 |
| abstract_inverted_index.Solving | 105 |
| abstract_inverted_index.address | 93 |
| abstract_inverted_index.between | 87, 148 |
| abstract_inverted_index.current | 140 |
| abstract_inverted_index.diverse | 161 |
| abstract_inverted_index.expert. | 80, 91, 152 |
| abstract_inverted_index.experts | 45, 141 |
| abstract_inverted_index.methods | 40 |
| abstract_inverted_index.plug-in | 159 |
| abstract_inverted_index.predict | 58 |
| abstract_inverted_index.propose | 97 |
| abstract_inverted_index.replace | 19 |
| abstract_inverted_index.results | 167 |
| abstract_inverted_index.routing | 60, 137 |
| abstract_inverted_index.tokens, | 50 |
| abstract_inverted_index.usually | 53 |
| abstract_inverted_index.(LVLMs). | 12 |
| abstract_inverted_index.Conflict | 108 |
| abstract_inverted_index.Existing | 38 |
| abstract_inverted_index.Gradient | 107 |
| abstract_inverted_index.However, | 64 |
| abstract_inverted_index.analysis | 103 |
| abstract_inverted_index.distinct | 71 |
| abstract_inverted_index.experts, | 144 |
| abstract_inverted_index.experts. | 124 |
| abstract_inverted_index.gradient | 102 |
| abstract_inverted_index.identify | 120 |
| abstract_inverted_index.methods, | 163 |
| abstract_inverted_index.problem, | 95 |
| abstract_inverted_index.publicly | 175 |
| abstract_inverted_index.reducing | 34, 146 |
| abstract_inverted_index.studying | 8 |
| abstract_inverted_index.tailored | 132 |
| abstract_inverted_index.achieving | 23 |
| abstract_inverted_index.attention | 6 |
| abstract_inverted_index.available | 176 |
| abstract_inverted_index.different | 44, 49 |
| abstract_inverted_index.encourage | 43, 134 |
| abstract_inverted_index.extensive | 165 |
| abstract_inverted_index.generated | 75 |
| abstract_inverted_index.gradients | 118 |
| abstract_inverted_index.inference | 36 |
| abstract_inverted_index.optimized | 69 |
| abstract_inverted_index.parameter | 72 |
| abstract_inverted_index.activating | 27 |
| abstract_inverted_index.comparable | 24 |
| abstract_inverted_index.concerning | 70 |
| abstract_inverted_index.directions | 74 |
| abstract_inverted_index.increasing | 5 |
| abstract_inverted_index.inference, | 31 |
| abstract_inverted_index.parameters | 29 |
| abstract_inverted_index.specialize | 47 |
| abstract_inverted_index.conflicting | 121, 135 |
| abstract_inverted_index.demonstrate | 168 |
| abstract_inverted_index.performance | 25 |
| abstract_inverted_index.token-level | 101, 117 |
| abstract_inverted_index.experimental | 166 |
| abstract_inverted_index.interference | 86, 147 |
| abstract_inverted_index.optimization | 73 |
| abstract_inverted_index.Specifically, | 113 |
| abstract_inverted_index.significantly | 33 |
| abstract_inverted_index.effectiveness. | 170 |
| abstract_inverted_index.regularization | 130 |
| abstract_inverted_index.Vision-Language | 10 |
| abstract_inverted_index.Mixture-of-Experts | 1 |
| abstract_inverted_index.https://github.com/longrongyang/STGC. | 178 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 7 |
| citation_normalized_percentile |