UnIT: Scalable Unstructured Inference-Time Pruning for MAC-efficient Neural Inference on MCUs Article Swipe
YOU?
·
· 2025
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2507.07885
Existing pruning methods are typically applied during training or compile time and often rely on structured sparsity. While compatible with low-power microcontrollers (MCUs), structured pruning underutilizes the opportunity for fine-grained efficiency on devices without SIMD support or parallel compute. To address these limitations, we introduce UnIT (Unstructured Inference-Time pruning), a lightweight method that dynamically identifies and skips unnecessary multiply-accumulate (MAC) operations during inference, guided by input-specific activation patterns. Unlike structured pruning, UnIT embraces irregular sparsity and does not require retraining or hardware specialization. It transforms pruning decisions into lightweight comparisons, replacing multiplications with threshold checks and approximated divisions. UnIT further optimizes compute by reusing threshold computations across multiple connections and applying layer- and group-specific pruning sensitivity. We present three fast, hardware-friendly division approximations tailored to the capabilities of common embedded platforms. Demonstrated on the MSP430 microcontroller, UnIT achieves 11.02% to 82.03% MAC reduction, 27.30% to 84.19% faster inference, and 27.33% to 84.38% lower energy consumption compared to training-time pruned models, while maintaining accuracy with 0.48-7%. Under domain shift, UnIT matches or exceeds the accuracy of retrained models while requiring significantly fewer MACs. These results establish unstructured inference-time pruning as a viable and practical solution for efficient, retraining-free deployment of deep neural networks on MCUs.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2507.07885
- https://arxiv.org/pdf/2507.07885
- OA Status
- green
- OpenAlex ID
- https://openalex.org/W4416287587
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4416287587Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2507.07885Digital Object Identifier
- Title
-
UnIT: Scalable Unstructured Inference-Time Pruning for MAC-efficient Neural Inference on MCUsWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2025Year of publication
- Publication date
-
2025-07-10Full publication date if available
- Authors
-
Ashe Neth, Sawinder Kaur, Subrata Biswas, Asif Salekin, Bashima IslamList of authors in order
- Landing page
-
https://arxiv.org/abs/2507.07885Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2507.07885Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2507.07885Direct OA link when available
- Cited by
-
0Total citation count in OpenAlex
Full payload
| id | https://openalex.org/W4416287587 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2507.07885 |
| ids.doi | https://doi.org/10.48550/arxiv.2507.07885 |
| ids.openalex | https://openalex.org/W4416287587 |
| fwci | |
| type | preprint |
| title | UnIT: Scalable Unstructured Inference-Time Pruning for MAC-efficient Neural Inference on MCUs |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2507.07885 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2507.07885 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2507.07885 |
| locations[1].id | doi:10.48550/arxiv.2507.07885 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | cc-by |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | https://openalex.org/licenses/cc-by |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2507.07885 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5120357356 |
| authorships[0].author.orcid | |
| authorships[0].author.display_name | Ashe Neth |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Neth, Ashe |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5009379906 |
| authorships[1].author.orcid | https://orcid.org/0000-0002-4500-1053 |
| authorships[1].author.display_name | Sawinder Kaur |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | kaur, Sawinder |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5061646605 |
| authorships[2].author.orcid | https://orcid.org/0000-0001-5959-8666 |
| authorships[2].author.display_name | Subrata Biswas |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Biswas, Subrata |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5065842715 |
| authorships[3].author.orcid | https://orcid.org/0000-0002-0807-8967 |
| authorships[3].author.display_name | Asif Salekin |
| authorships[3].author_position | last |
| authorships[3].raw_author_name | Salekin, Asif |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5035909313 |
| authorships[4].author.orcid | https://orcid.org/0000-0002-1917-054X |
| authorships[4].author.display_name | Bashima Islam |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Islam, Bashima |
| authorships[4].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2507.07885 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | UnIT: Scalable Unstructured Inference-Time Pruning for MAC-efficient Neural Inference on MCUs |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-28T10:18:10.449767 |
| primary_topic | |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2507.07885 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2507.07885 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2507.07885 |
| primary_location.id | pmh:oai:arXiv.org:2507.07885 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2507.07885 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2507.07885 |
| publication_date | 2025-07-10 |
| publication_year | 2025 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 49, 189 |
| abstract_inverted_index.It | 83 |
| abstract_inverted_index.To | 39 |
| abstract_inverted_index.We | 116 |
| abstract_inverted_index.as | 188 |
| abstract_inverted_index.by | 64, 102 |
| abstract_inverted_index.of | 127, 174, 198 |
| abstract_inverted_index.on | 14, 31, 132, 202 |
| abstract_inverted_index.or | 8, 36, 80, 170 |
| abstract_inverted_index.to | 124, 139, 144, 150, 156 |
| abstract_inverted_index.we | 43 |
| abstract_inverted_index.MAC | 141 |
| abstract_inverted_index.and | 11, 55, 75, 95, 109, 112, 148, 191 |
| abstract_inverted_index.are | 3 |
| abstract_inverted_index.for | 28, 194 |
| abstract_inverted_index.not | 77 |
| abstract_inverted_index.the | 26, 125, 133, 172 |
| abstract_inverted_index.SIMD | 34 |
| abstract_inverted_index.UnIT | 45, 71, 98, 136, 168 |
| abstract_inverted_index.deep | 199 |
| abstract_inverted_index.does | 76 |
| abstract_inverted_index.into | 87 |
| abstract_inverted_index.rely | 13 |
| abstract_inverted_index.that | 52 |
| abstract_inverted_index.time | 10 |
| abstract_inverted_index.with | 19, 92, 163 |
| abstract_inverted_index.(MAC) | 59 |
| abstract_inverted_index.MACs. | 181 |
| abstract_inverted_index.MCUs. | 203 |
| abstract_inverted_index.These | 182 |
| abstract_inverted_index.Under | 165 |
| abstract_inverted_index.While | 17 |
| abstract_inverted_index.fast, | 119 |
| abstract_inverted_index.fewer | 180 |
| abstract_inverted_index.lower | 152 |
| abstract_inverted_index.often | 12 |
| abstract_inverted_index.skips | 56 |
| abstract_inverted_index.these | 41 |
| abstract_inverted_index.three | 118 |
| abstract_inverted_index.while | 160, 177 |
| abstract_inverted_index.11.02% | 138 |
| abstract_inverted_index.27.30% | 143 |
| abstract_inverted_index.27.33% | 149 |
| abstract_inverted_index.82.03% | 140 |
| abstract_inverted_index.84.19% | 145 |
| abstract_inverted_index.84.38% | 151 |
| abstract_inverted_index.MSP430 | 134 |
| abstract_inverted_index.Unlike | 68 |
| abstract_inverted_index.across | 106 |
| abstract_inverted_index.checks | 94 |
| abstract_inverted_index.common | 128 |
| abstract_inverted_index.domain | 166 |
| abstract_inverted_index.during | 6, 61 |
| abstract_inverted_index.energy | 153 |
| abstract_inverted_index.faster | 146 |
| abstract_inverted_index.guided | 63 |
| abstract_inverted_index.layer- | 111 |
| abstract_inverted_index.method | 51 |
| abstract_inverted_index.models | 176 |
| abstract_inverted_index.neural | 200 |
| abstract_inverted_index.pruned | 158 |
| abstract_inverted_index.shift, | 167 |
| abstract_inverted_index.viable | 190 |
| abstract_inverted_index.(MCUs), | 22 |
| abstract_inverted_index.address | 40 |
| abstract_inverted_index.applied | 5 |
| abstract_inverted_index.compile | 9 |
| abstract_inverted_index.compute | 101 |
| abstract_inverted_index.devices | 32 |
| abstract_inverted_index.exceeds | 171 |
| abstract_inverted_index.further | 99 |
| abstract_inverted_index.matches | 169 |
| abstract_inverted_index.methods | 2 |
| abstract_inverted_index.models, | 159 |
| abstract_inverted_index.present | 117 |
| abstract_inverted_index.pruning | 1, 24, 85, 114, 187 |
| abstract_inverted_index.require | 78 |
| abstract_inverted_index.results | 183 |
| abstract_inverted_index.reusing | 103 |
| abstract_inverted_index.support | 35 |
| abstract_inverted_index.without | 33 |
| abstract_inverted_index.0.48-7%. | 164 |
| abstract_inverted_index.Existing | 0 |
| abstract_inverted_index.accuracy | 162, 173 |
| abstract_inverted_index.achieves | 137 |
| abstract_inverted_index.applying | 110 |
| abstract_inverted_index.compared | 155 |
| abstract_inverted_index.compute. | 38 |
| abstract_inverted_index.division | 121 |
| abstract_inverted_index.embedded | 129 |
| abstract_inverted_index.embraces | 72 |
| abstract_inverted_index.hardware | 81 |
| abstract_inverted_index.multiple | 107 |
| abstract_inverted_index.networks | 201 |
| abstract_inverted_index.parallel | 37 |
| abstract_inverted_index.pruning, | 70 |
| abstract_inverted_index.solution | 193 |
| abstract_inverted_index.sparsity | 74 |
| abstract_inverted_index.tailored | 123 |
| abstract_inverted_index.training | 7 |
| abstract_inverted_index.decisions | 86 |
| abstract_inverted_index.establish | 184 |
| abstract_inverted_index.introduce | 44 |
| abstract_inverted_index.irregular | 73 |
| abstract_inverted_index.low-power | 20 |
| abstract_inverted_index.optimizes | 100 |
| abstract_inverted_index.patterns. | 67 |
| abstract_inverted_index.practical | 192 |
| abstract_inverted_index.pruning), | 48 |
| abstract_inverted_index.replacing | 90 |
| abstract_inverted_index.requiring | 178 |
| abstract_inverted_index.retrained | 175 |
| abstract_inverted_index.sparsity. | 16 |
| abstract_inverted_index.threshold | 93, 104 |
| abstract_inverted_index.typically | 4 |
| abstract_inverted_index.activation | 66 |
| abstract_inverted_index.compatible | 18 |
| abstract_inverted_index.deployment | 197 |
| abstract_inverted_index.divisions. | 97 |
| abstract_inverted_index.efficiency | 30 |
| abstract_inverted_index.efficient, | 195 |
| abstract_inverted_index.identifies | 54 |
| abstract_inverted_index.inference, | 62, 147 |
| abstract_inverted_index.operations | 60 |
| abstract_inverted_index.platforms. | 130 |
| abstract_inverted_index.reduction, | 142 |
| abstract_inverted_index.retraining | 79 |
| abstract_inverted_index.structured | 15, 23, 69 |
| abstract_inverted_index.transforms | 84 |
| abstract_inverted_index.connections | 108 |
| abstract_inverted_index.consumption | 154 |
| abstract_inverted_index.dynamically | 53 |
| abstract_inverted_index.lightweight | 50, 88 |
| abstract_inverted_index.maintaining | 161 |
| abstract_inverted_index.opportunity | 27 |
| abstract_inverted_index.unnecessary | 57 |
| abstract_inverted_index.Demonstrated | 131 |
| abstract_inverted_index.approximated | 96 |
| abstract_inverted_index.capabilities | 126 |
| abstract_inverted_index.comparisons, | 89 |
| abstract_inverted_index.computations | 105 |
| abstract_inverted_index.fine-grained | 29 |
| abstract_inverted_index.limitations, | 42 |
| abstract_inverted_index.sensitivity. | 115 |
| abstract_inverted_index.unstructured | 185 |
| abstract_inverted_index.(Unstructured | 46 |
| abstract_inverted_index.significantly | 179 |
| abstract_inverted_index.training-time | 157 |
| abstract_inverted_index.underutilizes | 25 |
| abstract_inverted_index.Inference-Time | 47 |
| abstract_inverted_index.approximations | 122 |
| abstract_inverted_index.group-specific | 113 |
| abstract_inverted_index.inference-time | 186 |
| abstract_inverted_index.input-specific | 65 |
| abstract_inverted_index.multiplications | 91 |
| abstract_inverted_index.retraining-free | 196 |
| abstract_inverted_index.specialization. | 82 |
| abstract_inverted_index.microcontroller, | 135 |
| abstract_inverted_index.microcontrollers | 21 |
| abstract_inverted_index.hardware-friendly | 120 |
| abstract_inverted_index.multiply-accumulate | 58 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 5 |
| citation_normalized_percentile |