WaveMix: Resource-efficient Token Mixing for Images Article Swipe
YOU?
·
· 2022
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2203.03689
Although certain vision transformer (ViT) and CNN architectures generalize well on vision tasks, it is often impractical to use them on green, edge, or desktop computing due to their computational requirements for training and even testing. We present WaveMix as an alternative neural architecture that uses a multi-scale 2D discrete wavelet transform (DWT) for spatial token mixing. Unlike ViTs, WaveMix neither unrolls the image nor requires self-attention of quadratic complexity. Additionally, DWT introduces another inductive bias -- besides convolutional filtering -- to utilize the 2D structure of an image to improve generalization. The multi-scale nature of the DWT also reduces the requirement for a deeper architecture compared to the CNNs, as the latter relies on pooling for partial spatial mixing. WaveMix models show generalization that is competitive with ViTs, CNNs, and token mixers on several datasets while requiring lower GPU RAM (training and testing), number of computations, and storage. WaveMix have achieved State-of-the-art (SOTA) results in EMNIST Byclass and EMNIST Balanced datasets.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2203.03689
- https://arxiv.org/pdf/2203.03689
- OA Status
- green
- Cited By
- 2
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4226277904
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4226277904Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2203.03689Digital Object Identifier
- Title
-
WaveMix: Resource-efficient Token Mixing for ImagesWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2022Year of publication
- Publication date
-
2022-03-07Full publication date if available
- Authors
-
Pranav Jeevan, Amit SethiList of authors in order
- Landing page
-
https://arxiv.org/abs/2203.03689Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2203.03689Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2203.03689Direct OA link when available
- Concepts
-
Computer science, Pooling, Security token, Convolutional neural network, Generalization, Artificial intelligence, Discrete wavelet transform, Wavelet, Pattern recognition (psychology), Wavelet transform, Mathematics, Mathematical analysis, Computer securityTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
2Total citation count in OpenAlex
- Citations by year (recent)
-
2024: 1, 2022: 1Per-year citation counts (last 5 years)
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4226277904 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2203.03689 |
| ids.doi | https://doi.org/10.48550/arxiv.2203.03689 |
| ids.openalex | https://openalex.org/W4226277904 |
| fwci | |
| type | preprint |
| title | WaveMix: Resource-efficient Token Mixing for Images |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10689 |
| topics[0].field.id | https://openalex.org/fields/22 |
| topics[0].field.display_name | Engineering |
| topics[0].score | 0.9916999936103821 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/2214 |
| topics[0].subfield.display_name | Media Technology |
| topics[0].display_name | Remote-Sensing Image Classification |
| topics[1].id | https://openalex.org/T11659 |
| topics[1].field.id | https://openalex.org/fields/22 |
| topics[1].field.display_name | Engineering |
| topics[1].score | 0.9898999929428101 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/2214 |
| topics[1].subfield.display_name | Media Technology |
| topics[1].display_name | Advanced Image Fusion Techniques |
| topics[2].id | https://openalex.org/T10688 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.9887999892234802 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1707 |
| topics[2].subfield.display_name | Computer Vision and Pattern Recognition |
| topics[2].display_name | Image and Signal Denoising Methods |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C41008148 |
| concepts[0].level | 0 |
| concepts[0].score | 0.7745069265365601 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[0].display_name | Computer science |
| concepts[1].id | https://openalex.org/C70437156 |
| concepts[1].level | 2 |
| concepts[1].score | 0.6224770545959473 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q7228652 |
| concepts[1].display_name | Pooling |
| concepts[2].id | https://openalex.org/C48145219 |
| concepts[2].level | 2 |
| concepts[2].score | 0.6094322204589844 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q1335365 |
| concepts[2].display_name | Security token |
| concepts[3].id | https://openalex.org/C81363708 |
| concepts[3].level | 2 |
| concepts[3].score | 0.5164752006530762 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q17084460 |
| concepts[3].display_name | Convolutional neural network |
| concepts[4].id | https://openalex.org/C177148314 |
| concepts[4].level | 2 |
| concepts[4].score | 0.5135319828987122 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q170084 |
| concepts[4].display_name | Generalization |
| concepts[5].id | https://openalex.org/C154945302 |
| concepts[5].level | 1 |
| concepts[5].score | 0.5045326948165894 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[5].display_name | Artificial intelligence |
| concepts[6].id | https://openalex.org/C46286280 |
| concepts[6].level | 4 |
| concepts[6].score | 0.4660279452800751 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q2414958 |
| concepts[6].display_name | Discrete wavelet transform |
| concepts[7].id | https://openalex.org/C47432892 |
| concepts[7].level | 2 |
| concepts[7].score | 0.3234606981277466 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q831390 |
| concepts[7].display_name | Wavelet |
| concepts[8].id | https://openalex.org/C153180895 |
| concepts[8].level | 2 |
| concepts[8].score | 0.3223114013671875 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q7148389 |
| concepts[8].display_name | Pattern recognition (psychology) |
| concepts[9].id | https://openalex.org/C196216189 |
| concepts[9].level | 3 |
| concepts[9].score | 0.2896353602409363 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q2867 |
| concepts[9].display_name | Wavelet transform |
| concepts[10].id | https://openalex.org/C33923547 |
| concepts[10].level | 0 |
| concepts[10].score | 0.10134962201118469 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q395 |
| concepts[10].display_name | Mathematics |
| concepts[11].id | https://openalex.org/C134306372 |
| concepts[11].level | 1 |
| concepts[11].score | 0.0 |
| concepts[11].wikidata | https://www.wikidata.org/wiki/Q7754 |
| concepts[11].display_name | Mathematical analysis |
| concepts[12].id | https://openalex.org/C38652104 |
| concepts[12].level | 1 |
| concepts[12].score | 0.0 |
| concepts[12].wikidata | https://www.wikidata.org/wiki/Q3510521 |
| concepts[12].display_name | Computer security |
| keywords[0].id | https://openalex.org/keywords/computer-science |
| keywords[0].score | 0.7745069265365601 |
| keywords[0].display_name | Computer science |
| keywords[1].id | https://openalex.org/keywords/pooling |
| keywords[1].score | 0.6224770545959473 |
| keywords[1].display_name | Pooling |
| keywords[2].id | https://openalex.org/keywords/security-token |
| keywords[2].score | 0.6094322204589844 |
| keywords[2].display_name | Security token |
| keywords[3].id | https://openalex.org/keywords/convolutional-neural-network |
| keywords[3].score | 0.5164752006530762 |
| keywords[3].display_name | Convolutional neural network |
| keywords[4].id | https://openalex.org/keywords/generalization |
| keywords[4].score | 0.5135319828987122 |
| keywords[4].display_name | Generalization |
| keywords[5].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[5].score | 0.5045326948165894 |
| keywords[5].display_name | Artificial intelligence |
| keywords[6].id | https://openalex.org/keywords/discrete-wavelet-transform |
| keywords[6].score | 0.4660279452800751 |
| keywords[6].display_name | Discrete wavelet transform |
| keywords[7].id | https://openalex.org/keywords/wavelet |
| keywords[7].score | 0.3234606981277466 |
| keywords[7].display_name | Wavelet |
| keywords[8].id | https://openalex.org/keywords/pattern-recognition |
| keywords[8].score | 0.3223114013671875 |
| keywords[8].display_name | Pattern recognition (psychology) |
| keywords[9].id | https://openalex.org/keywords/wavelet-transform |
| keywords[9].score | 0.2896353602409363 |
| keywords[9].display_name | Wavelet transform |
| keywords[10].id | https://openalex.org/keywords/mathematics |
| keywords[10].score | 0.10134962201118469 |
| keywords[10].display_name | Mathematics |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2203.03689 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2203.03689 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2203.03689 |
| locations[1].id | doi:10.48550/arxiv.2203.03689 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | cc-by |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | https://openalex.org/licenses/cc-by |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2203.03689 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5090066782 |
| authorships[0].author.orcid | |
| authorships[0].author.display_name | Pranav Jeevan |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Jeevan, Pranav |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5087629613 |
| authorships[1].author.orcid | https://orcid.org/0000-0002-8634-1804 |
| authorships[1].author.display_name | Amit Sethi |
| authorships[1].author_position | last |
| authorships[1].raw_author_name | Sethi, Amit |
| authorships[1].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2203.03689 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2022-05-05T00:00:00 |
| display_name | WaveMix: Resource-efficient Token Mixing for Images |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10689 |
| primary_topic.field.id | https://openalex.org/fields/22 |
| primary_topic.field.display_name | Engineering |
| primary_topic.score | 0.9916999936103821 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/2214 |
| primary_topic.subfield.display_name | Media Technology |
| primary_topic.display_name | Remote-Sensing Image Classification |
| related_works | https://openalex.org/W2953234277, https://openalex.org/W2626256601, https://openalex.org/W2900413183, https://openalex.org/W183670115, https://openalex.org/W1501179639, https://openalex.org/W3199035354, https://openalex.org/W1807354010, https://openalex.org/W3143644526, https://openalex.org/W598225674, https://openalex.org/W2734230146 |
| cited_by_count | 2 |
| counts_by_year[0].year | 2024 |
| counts_by_year[0].cited_by_count | 1 |
| counts_by_year[1].year | 2022 |
| counts_by_year[1].cited_by_count | 1 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2203.03689 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2203.03689 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2203.03689 |
| primary_location.id | pmh:oai:arXiv.org:2203.03689 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2203.03689 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2203.03689 |
| publication_date | 2022-03-07 |
| publication_year | 2022 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 46, 103 |
| abstract_inverted_index.-- | 76, 80 |
| abstract_inverted_index.2D | 48, 84 |
| abstract_inverted_index.We | 36 |
| abstract_inverted_index.an | 40, 87 |
| abstract_inverted_index.as | 39, 110 |
| abstract_inverted_index.in | 155 |
| abstract_inverted_index.is | 14, 125 |
| abstract_inverted_index.it | 13 |
| abstract_inverted_index.of | 67, 86, 95, 145 |
| abstract_inverted_index.on | 10, 20, 114, 133 |
| abstract_inverted_index.or | 23 |
| abstract_inverted_index.to | 17, 27, 81, 89, 107 |
| abstract_inverted_index.CNN | 6 |
| abstract_inverted_index.DWT | 71, 97 |
| abstract_inverted_index.GPU | 139 |
| abstract_inverted_index.RAM | 140 |
| abstract_inverted_index.The | 92 |
| abstract_inverted_index.and | 5, 33, 130, 142, 147, 158 |
| abstract_inverted_index.due | 26 |
| abstract_inverted_index.for | 31, 53, 102, 116 |
| abstract_inverted_index.nor | 64 |
| abstract_inverted_index.the | 62, 83, 96, 100, 108, 111 |
| abstract_inverted_index.use | 18 |
| abstract_inverted_index.also | 98 |
| abstract_inverted_index.bias | 75 |
| abstract_inverted_index.even | 34 |
| abstract_inverted_index.have | 150 |
| abstract_inverted_index.show | 122 |
| abstract_inverted_index.that | 44, 124 |
| abstract_inverted_index.them | 19 |
| abstract_inverted_index.uses | 45 |
| abstract_inverted_index.well | 9 |
| abstract_inverted_index.with | 127 |
| abstract_inverted_index.(DWT) | 52 |
| abstract_inverted_index.(ViT) | 4 |
| abstract_inverted_index.CNNs, | 109, 129 |
| abstract_inverted_index.ViTs, | 58, 128 |
| abstract_inverted_index.edge, | 22 |
| abstract_inverted_index.image | 63, 88 |
| abstract_inverted_index.lower | 138 |
| abstract_inverted_index.often | 15 |
| abstract_inverted_index.their | 28 |
| abstract_inverted_index.token | 55, 131 |
| abstract_inverted_index.while | 136 |
| abstract_inverted_index.(SOTA) | 153 |
| abstract_inverted_index.EMNIST | 156, 159 |
| abstract_inverted_index.Unlike | 57 |
| abstract_inverted_index.deeper | 104 |
| abstract_inverted_index.green, | 21 |
| abstract_inverted_index.latter | 112 |
| abstract_inverted_index.mixers | 132 |
| abstract_inverted_index.models | 121 |
| abstract_inverted_index.nature | 94 |
| abstract_inverted_index.neural | 42 |
| abstract_inverted_index.number | 144 |
| abstract_inverted_index.relies | 113 |
| abstract_inverted_index.tasks, | 12 |
| abstract_inverted_index.vision | 2, 11 |
| abstract_inverted_index.Byclass | 157 |
| abstract_inverted_index.WaveMix | 38, 59, 120, 149 |
| abstract_inverted_index.another | 73 |
| abstract_inverted_index.besides | 77 |
| abstract_inverted_index.certain | 1 |
| abstract_inverted_index.desktop | 24 |
| abstract_inverted_index.improve | 90 |
| abstract_inverted_index.mixing. | 56, 119 |
| abstract_inverted_index.neither | 60 |
| abstract_inverted_index.partial | 117 |
| abstract_inverted_index.pooling | 115 |
| abstract_inverted_index.present | 37 |
| abstract_inverted_index.reduces | 99 |
| abstract_inverted_index.results | 154 |
| abstract_inverted_index.several | 134 |
| abstract_inverted_index.spatial | 54, 118 |
| abstract_inverted_index.unrolls | 61 |
| abstract_inverted_index.utilize | 82 |
| abstract_inverted_index.wavelet | 50 |
| abstract_inverted_index.Although | 0 |
| abstract_inverted_index.Balanced | 160 |
| abstract_inverted_index.achieved | 151 |
| abstract_inverted_index.compared | 106 |
| abstract_inverted_index.datasets | 135 |
| abstract_inverted_index.discrete | 49 |
| abstract_inverted_index.requires | 65 |
| abstract_inverted_index.storage. | 148 |
| abstract_inverted_index.testing. | 35 |
| abstract_inverted_index.training | 32 |
| abstract_inverted_index.(training | 141 |
| abstract_inverted_index.computing | 25 |
| abstract_inverted_index.datasets. | 161 |
| abstract_inverted_index.filtering | 79 |
| abstract_inverted_index.inductive | 74 |
| abstract_inverted_index.quadratic | 68 |
| abstract_inverted_index.requiring | 137 |
| abstract_inverted_index.structure | 85 |
| abstract_inverted_index.testing), | 143 |
| abstract_inverted_index.transform | 51 |
| abstract_inverted_index.generalize | 8 |
| abstract_inverted_index.introduces | 72 |
| abstract_inverted_index.alternative | 41 |
| abstract_inverted_index.competitive | 126 |
| abstract_inverted_index.complexity. | 69 |
| abstract_inverted_index.impractical | 16 |
| abstract_inverted_index.multi-scale | 47, 93 |
| abstract_inverted_index.requirement | 101 |
| abstract_inverted_index.transformer | 3 |
| abstract_inverted_index.architecture | 43, 105 |
| abstract_inverted_index.requirements | 30 |
| abstract_inverted_index.Additionally, | 70 |
| abstract_inverted_index.architectures | 7 |
| abstract_inverted_index.computational | 29 |
| abstract_inverted_index.computations, | 146 |
| abstract_inverted_index.convolutional | 78 |
| abstract_inverted_index.generalization | 123 |
| abstract_inverted_index.self-attention | 66 |
| abstract_inverted_index.generalization. | 91 |
| abstract_inverted_index.State-of-the-art | 152 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 2 |
| sustainable_development_goals[0].id | https://metadata.un.org/sdg/8 |
| sustainable_development_goals[0].score | 0.47999998927116394 |
| sustainable_development_goals[0].display_name | Decent work and economic growth |
| citation_normalized_percentile |