Learning to Merge Tokens in Vision Transformers Article Swipe
Cédric Renggli
,
André Susano Pinto
,
Neil Houlsby
,
Basil Mustafa
,
Joan Puigcerver
,
Carlos Riquelme
·
YOU?
·
· 2022
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2202.12015
YOU?
·
· 2022
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2202.12015
Transformers are widely applied to solve natural language understanding and computer vision tasks. While scaling up these architectures leads to improved performance, it often comes at the expense of much higher computational costs. In order for large-scale models to remain practical in real-world systems, there is a need for reducing their computational overhead. In this work, we present the PatchMerger, a simple module that reduces the number of patches or tokens the network has to process by merging them between two consecutive intermediate layers. We show that the PatchMerger achieves a significant speedup across various model sizes while matching the original performance both upstream and downstream after fine-tuning.
Related Topics
Concepts
Metadata
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2202.12015
- https://arxiv.org/pdf/2202.12015
- OA Status
- green
- Cited By
- 24
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4226161578
All OpenAlex metadata
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4226161578Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2202.12015Digital Object Identifier
- Title
-
Learning to Merge Tokens in Vision TransformersWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2022Year of publication
- Publication date
-
2022-02-24Full publication date if available
- Authors
-
Cédric Renggli, André Susano Pinto, Neil Houlsby, Basil Mustafa, Joan Puigcerver, Carlos RiquelmeList of authors in order
- Landing page
-
https://arxiv.org/abs/2202.12015Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2202.12015Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2202.12015Direct OA link when available
- Concepts
-
Computer science, Merge (version control), Speedup, Transformer, Scaling, Computer engineering, Parallel computing, Artificial intelligence, Distributed computing, Engineering, Mathematics, Voltage, Geometry, Electrical engineeringTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
24Total citation count in OpenAlex
- Citations by year (recent)
-
2025: 4, 2024: 6, 2023: 11, 2022: 3Per-year citation counts (last 5 years)
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4226161578 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2202.12015 |
| ids.doi | https://doi.org/10.48550/arxiv.2202.12015 |
| ids.openalex | https://openalex.org/W4226161578 |
| fwci | |
| type | preprint |
| title | Learning to Merge Tokens in Vision Transformers |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T11714 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9991000294685364 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1707 |
| topics[0].subfield.display_name | Computer Vision and Pattern Recognition |
| topics[0].display_name | Multimodal Machine Learning Applications |
| topics[1].id | https://openalex.org/T10036 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.996999979019165 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1707 |
| topics[1].subfield.display_name | Computer Vision and Pattern Recognition |
| topics[1].display_name | Advanced Neural Network Applications |
| topics[2].id | https://openalex.org/T11307 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.9923999905586243 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1702 |
| topics[2].subfield.display_name | Artificial Intelligence |
| topics[2].display_name | Domain Adaptation and Few-Shot Learning |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C41008148 |
| concepts[0].level | 0 |
| concepts[0].score | 0.8229330778121948 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[0].display_name | Computer science |
| concepts[1].id | https://openalex.org/C197129107 |
| concepts[1].level | 2 |
| concepts[1].score | 0.8153785467147827 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q1921621 |
| concepts[1].display_name | Merge (version control) |
| concepts[2].id | https://openalex.org/C68339613 |
| concepts[2].level | 2 |
| concepts[2].score | 0.8099373579025269 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q1549489 |
| concepts[2].display_name | Speedup |
| concepts[3].id | https://openalex.org/C66322947 |
| concepts[3].level | 3 |
| concepts[3].score | 0.6336356401443481 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q11658 |
| concepts[3].display_name | Transformer |
| concepts[4].id | https://openalex.org/C99844830 |
| concepts[4].level | 2 |
| concepts[4].score | 0.5483218431472778 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q102441924 |
| concepts[4].display_name | Scaling |
| concepts[5].id | https://openalex.org/C113775141 |
| concepts[5].level | 1 |
| concepts[5].score | 0.45270296931266785 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q428691 |
| concepts[5].display_name | Computer engineering |
| concepts[6].id | https://openalex.org/C173608175 |
| concepts[6].level | 1 |
| concepts[6].score | 0.42301711440086365 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q232661 |
| concepts[6].display_name | Parallel computing |
| concepts[7].id | https://openalex.org/C154945302 |
| concepts[7].level | 1 |
| concepts[7].score | 0.4038671553134918 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[7].display_name | Artificial intelligence |
| concepts[8].id | https://openalex.org/C120314980 |
| concepts[8].level | 1 |
| concepts[8].score | 0.3700162172317505 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q180634 |
| concepts[8].display_name | Distributed computing |
| concepts[9].id | https://openalex.org/C127413603 |
| concepts[9].level | 0 |
| concepts[9].score | 0.08775058388710022 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q11023 |
| concepts[9].display_name | Engineering |
| concepts[10].id | https://openalex.org/C33923547 |
| concepts[10].level | 0 |
| concepts[10].score | 0.0 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q395 |
| concepts[10].display_name | Mathematics |
| concepts[11].id | https://openalex.org/C165801399 |
| concepts[11].level | 2 |
| concepts[11].score | 0.0 |
| concepts[11].wikidata | https://www.wikidata.org/wiki/Q25428 |
| concepts[11].display_name | Voltage |
| concepts[12].id | https://openalex.org/C2524010 |
| concepts[12].level | 1 |
| concepts[12].score | 0.0 |
| concepts[12].wikidata | https://www.wikidata.org/wiki/Q8087 |
| concepts[12].display_name | Geometry |
| concepts[13].id | https://openalex.org/C119599485 |
| concepts[13].level | 1 |
| concepts[13].score | 0.0 |
| concepts[13].wikidata | https://www.wikidata.org/wiki/Q43035 |
| concepts[13].display_name | Electrical engineering |
| keywords[0].id | https://openalex.org/keywords/computer-science |
| keywords[0].score | 0.8229330778121948 |
| keywords[0].display_name | Computer science |
| keywords[1].id | https://openalex.org/keywords/merge |
| keywords[1].score | 0.8153785467147827 |
| keywords[1].display_name | Merge (version control) |
| keywords[2].id | https://openalex.org/keywords/speedup |
| keywords[2].score | 0.8099373579025269 |
| keywords[2].display_name | Speedup |
| keywords[3].id | https://openalex.org/keywords/transformer |
| keywords[3].score | 0.6336356401443481 |
| keywords[3].display_name | Transformer |
| keywords[4].id | https://openalex.org/keywords/scaling |
| keywords[4].score | 0.5483218431472778 |
| keywords[4].display_name | Scaling |
| keywords[5].id | https://openalex.org/keywords/computer-engineering |
| keywords[5].score | 0.45270296931266785 |
| keywords[5].display_name | Computer engineering |
| keywords[6].id | https://openalex.org/keywords/parallel-computing |
| keywords[6].score | 0.42301711440086365 |
| keywords[6].display_name | Parallel computing |
| keywords[7].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[7].score | 0.4038671553134918 |
| keywords[7].display_name | Artificial intelligence |
| keywords[8].id | https://openalex.org/keywords/distributed-computing |
| keywords[8].score | 0.3700162172317505 |
| keywords[8].display_name | Distributed computing |
| keywords[9].id | https://openalex.org/keywords/engineering |
| keywords[9].score | 0.08775058388710022 |
| keywords[9].display_name | Engineering |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2202.12015 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2202.12015 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2202.12015 |
| locations[1].id | doi:10.48550/arxiv.2202.12015 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | cc-by |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | https://openalex.org/licenses/cc-by |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2202.12015 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5063262952 |
| authorships[0].author.orcid | https://orcid.org/0000-0003-3271-3059 |
| authorships[0].author.display_name | Cédric Renggli |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Renggli, Cedric |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5102389365 |
| authorships[1].author.orcid | |
| authorships[1].author.display_name | André Susano Pinto |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Pinto, André Susano |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5068878643 |
| authorships[2].author.orcid | |
| authorships[2].author.display_name | Neil Houlsby |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Houlsby, Neil |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5072796087 |
| authorships[3].author.orcid | https://orcid.org/0000-0001-7305-7890 |
| authorships[3].author.display_name | Basil Mustafa |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Mustafa, Basil |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5001091279 |
| authorships[4].author.orcid | https://orcid.org/0000-0002-1926-2233 |
| authorships[4].author.display_name | Joan Puigcerver |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Puigcerver, Joan |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A5112854801 |
| authorships[5].author.orcid | |
| authorships[5].author.display_name | Carlos Riquelme |
| authorships[5].author_position | last |
| authorships[5].raw_author_name | Riquelme, Carlos |
| authorships[5].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2202.12015 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Learning to Merge Tokens in Vision Transformers |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T11714 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9991000294685364 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1707 |
| primary_topic.subfield.display_name | Computer Vision and Pattern Recognition |
| primary_topic.display_name | Multimodal Machine Learning Applications |
| related_works | https://openalex.org/W2058965144, https://openalex.org/W2164382479, https://openalex.org/W2146343568, https://openalex.org/W98480971, https://openalex.org/W2150291671, https://openalex.org/W2013643406, https://openalex.org/W2027972911, https://openalex.org/W2157978810, https://openalex.org/W2597809628, https://openalex.org/W3046370962 |
| cited_by_count | 24 |
| counts_by_year[0].year | 2025 |
| counts_by_year[0].cited_by_count | 4 |
| counts_by_year[1].year | 2024 |
| counts_by_year[1].cited_by_count | 6 |
| counts_by_year[2].year | 2023 |
| counts_by_year[2].cited_by_count | 11 |
| counts_by_year[3].year | 2022 |
| counts_by_year[3].cited_by_count | 3 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2202.12015 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2202.12015 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2202.12015 |
| primary_location.id | pmh:oai:arXiv.org:2202.12015 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2202.12015 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2202.12015 |
| publication_date | 2022-02-24 |
| publication_year | 2022 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 46, 60, 90 |
| abstract_inverted_index.In | 33, 53 |
| abstract_inverted_index.We | 84 |
| abstract_inverted_index.at | 25 |
| abstract_inverted_index.by | 76 |
| abstract_inverted_index.in | 41 |
| abstract_inverted_index.is | 45 |
| abstract_inverted_index.it | 22 |
| abstract_inverted_index.of | 28, 67 |
| abstract_inverted_index.or | 69 |
| abstract_inverted_index.to | 4, 19, 38, 74 |
| abstract_inverted_index.up | 15 |
| abstract_inverted_index.we | 56 |
| abstract_inverted_index.and | 9, 104 |
| abstract_inverted_index.are | 1 |
| abstract_inverted_index.for | 35, 48 |
| abstract_inverted_index.has | 73 |
| abstract_inverted_index.the | 26, 58, 65, 71, 87, 99 |
| abstract_inverted_index.two | 80 |
| abstract_inverted_index.both | 102 |
| abstract_inverted_index.much | 29 |
| abstract_inverted_index.need | 47 |
| abstract_inverted_index.show | 85 |
| abstract_inverted_index.that | 63, 86 |
| abstract_inverted_index.them | 78 |
| abstract_inverted_index.this | 54 |
| abstract_inverted_index.While | 13 |
| abstract_inverted_index.after | 106 |
| abstract_inverted_index.comes | 24 |
| abstract_inverted_index.leads | 18 |
| abstract_inverted_index.model | 95 |
| abstract_inverted_index.often | 23 |
| abstract_inverted_index.order | 34 |
| abstract_inverted_index.sizes | 96 |
| abstract_inverted_index.solve | 5 |
| abstract_inverted_index.their | 50 |
| abstract_inverted_index.there | 44 |
| abstract_inverted_index.these | 16 |
| abstract_inverted_index.while | 97 |
| abstract_inverted_index.work, | 55 |
| abstract_inverted_index.across | 93 |
| abstract_inverted_index.costs. | 32 |
| abstract_inverted_index.higher | 30 |
| abstract_inverted_index.models | 37 |
| abstract_inverted_index.module | 62 |
| abstract_inverted_index.number | 66 |
| abstract_inverted_index.remain | 39 |
| abstract_inverted_index.simple | 61 |
| abstract_inverted_index.tasks. | 12 |
| abstract_inverted_index.tokens | 70 |
| abstract_inverted_index.vision | 11 |
| abstract_inverted_index.widely | 2 |
| abstract_inverted_index.applied | 3 |
| abstract_inverted_index.between | 79 |
| abstract_inverted_index.expense | 27 |
| abstract_inverted_index.layers. | 83 |
| abstract_inverted_index.merging | 77 |
| abstract_inverted_index.natural | 6 |
| abstract_inverted_index.network | 72 |
| abstract_inverted_index.patches | 68 |
| abstract_inverted_index.present | 57 |
| abstract_inverted_index.process | 75 |
| abstract_inverted_index.reduces | 64 |
| abstract_inverted_index.scaling | 14 |
| abstract_inverted_index.speedup | 92 |
| abstract_inverted_index.various | 94 |
| abstract_inverted_index.achieves | 89 |
| abstract_inverted_index.computer | 10 |
| abstract_inverted_index.improved | 20 |
| abstract_inverted_index.language | 7 |
| abstract_inverted_index.matching | 98 |
| abstract_inverted_index.original | 100 |
| abstract_inverted_index.reducing | 49 |
| abstract_inverted_index.systems, | 43 |
| abstract_inverted_index.upstream | 103 |
| abstract_inverted_index.overhead. | 52 |
| abstract_inverted_index.practical | 40 |
| abstract_inverted_index.downstream | 105 |
| abstract_inverted_index.real-world | 42 |
| abstract_inverted_index.PatchMerger | 88 |
| abstract_inverted_index.consecutive | 81 |
| abstract_inverted_index.large-scale | 36 |
| abstract_inverted_index.performance | 101 |
| abstract_inverted_index.significant | 91 |
| abstract_inverted_index.PatchMerger, | 59 |
| abstract_inverted_index.Transformers | 0 |
| abstract_inverted_index.fine-tuning. | 107 |
| abstract_inverted_index.intermediate | 82 |
| abstract_inverted_index.performance, | 21 |
| abstract_inverted_index.architectures | 17 |
| abstract_inverted_index.computational | 31, 51 |
| abstract_inverted_index.understanding | 8 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 6 |
| sustainable_development_goals[0].id | https://metadata.un.org/sdg/4 |
| sustainable_development_goals[0].score | 0.75 |
| sustainable_development_goals[0].display_name | Quality Education |
| citation_normalized_percentile |