Vec-LUT: Vector Table Lookup for Parallel Ultra-Low-Bit LLM Inference on Edge Devices Article Swipe
YOU?
·
· 2025
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2512.06443
Large language models (LLMs) are increasingly deployed on edge devices. To meet strict resource constraints, real-world deployment has pushed LLM quantization from 8-bit to 4-bit, 2-bit, and now 1.58-bit. Combined with lookup table (LUT)-based inference, CPUs run these ultra-low-bit LLMs even faster than NPUs, opening new opportunities for ubiquitous on-device intelligence. However, this paper identifies that LUT-based inference underutilizes memory bandwidth during parallel inference, which is required for prefilling, test-time scaling, and other multi-token scenarios. The root cause is the scalar LUT paradigm, which performs repetitive and non-contiguous memory accesses for each token. To solve the issue, we propose vector LUT, a new lookup paradigm that constructs a unified LUT across parallel tokens, and performs a single $1 \rightarrow N$ lookup per index. To realize it efficiently, we further introduce (1) Vector LUT-Centric Tensor Layout, and (2) Cache-Aware Streamed Lookup techniques. Evaluations on 5 edge devices across 3 LLMs show that Vec-LUT outperforms state-of-the-art baselines by up to $4.2\times$. Our implementation is integrated into llama.cpp. The code is available at https://github.com/Cipherxzc/vlut.cpp.
Related Topics
- Type
- preprint
- Landing Page
- https://doi.org/10.48550/arxiv.2512.06443
- OA Status
- green
- OpenAlex ID
- https://openalex.org/W7110787080
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W7110787080Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2512.06443Digital Object Identifier
- Title
-
Vec-LUT: Vector Table Lookup for Parallel Ultra-Low-Bit LLM Inference on Edge DevicesWork title
- Type
-
preprintOpenAlex work type
- Publication year
-
2025Year of publication
- Publication date
-
2025-12-06Full publication date if available
- Authors
-
Li Xiangyu, Yin Cheng-yu, Wang Wei-jun, Wei Jianyu, Cao, Ting, Liu YunxinList of authors in order
- Landing page
-
https://doi.org/10.48550/arxiv.2512.06443Publisher landing page
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://doi.org/10.48550/arxiv.2512.06443Direct OA link when available
- Concepts
-
Lookup table, Computer science, Table (database), Parallel computing, Bandwidth (computing), Software deployment, Enhanced Data Rates for GSM Evolution, Inference, Memory bandwidth, Edge device, Computer engineering, Algorithm, Computer hardware, Computer architecture, Quantization (signal processing), Vector quantization, Key (lock), Code (set theory), Scalar (mathematics), High memory, Resource (disambiguation), Theoretical computer science, Memory managementTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
Full payload
| id | https://openalex.org/W7110787080 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2512.06443 |
| ids.doi | https://doi.org/10.48550/arxiv.2512.06443 |
| ids.openalex | https://openalex.org/W7110787080 |
| fwci | |
| type | preprint |
| title | Vec-LUT: Vector Table Lookup for Parallel Ultra-Low-Bit LLM Inference on Edge Devices |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C134835016 |
| concepts[0].level | 2 |
| concepts[0].score | 0.8553929328918457 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q690265 |
| concepts[0].display_name | Lookup table |
| concepts[1].id | https://openalex.org/C41008148 |
| concepts[1].level | 0 |
| concepts[1].score | 0.7922654151916504 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[1].display_name | Computer science |
| concepts[2].id | https://openalex.org/C45235069 |
| concepts[2].level | 2 |
| concepts[2].score | 0.5679431557655334 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q278425 |
| concepts[2].display_name | Table (database) |
| concepts[3].id | https://openalex.org/C173608175 |
| concepts[3].level | 1 |
| concepts[3].score | 0.5348172783851624 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q232661 |
| concepts[3].display_name | Parallel computing |
| concepts[4].id | https://openalex.org/C2776257435 |
| concepts[4].level | 2 |
| concepts[4].score | 0.3962109386920929 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q1576430 |
| concepts[4].display_name | Bandwidth (computing) |
| concepts[5].id | https://openalex.org/C105339364 |
| concepts[5].level | 2 |
| concepts[5].score | 0.3701060116291046 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q2297740 |
| concepts[5].display_name | Software deployment |
| concepts[6].id | https://openalex.org/C162307627 |
| concepts[6].level | 2 |
| concepts[6].score | 0.3698514997959137 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q204833 |
| concepts[6].display_name | Enhanced Data Rates for GSM Evolution |
| concepts[7].id | https://openalex.org/C2776214188 |
| concepts[7].level | 2 |
| concepts[7].score | 0.3684523105621338 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q408386 |
| concepts[7].display_name | Inference |
| concepts[8].id | https://openalex.org/C188045654 |
| concepts[8].level | 2 |
| concepts[8].score | 0.3683995008468628 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q17148339 |
| concepts[8].display_name | Memory bandwidth |
| concepts[9].id | https://openalex.org/C138236772 |
| concepts[9].level | 3 |
| concepts[9].score | 0.36734962463378906 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q25098575 |
| concepts[9].display_name | Edge device |
| concepts[10].id | https://openalex.org/C113775141 |
| concepts[10].level | 1 |
| concepts[10].score | 0.36000025272369385 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q428691 |
| concepts[10].display_name | Computer engineering |
| concepts[11].id | https://openalex.org/C11413529 |
| concepts[11].level | 1 |
| concepts[11].score | 0.347671240568161 |
| concepts[11].wikidata | https://www.wikidata.org/wiki/Q8366 |
| concepts[11].display_name | Algorithm |
| concepts[12].id | https://openalex.org/C9390403 |
| concepts[12].level | 1 |
| concepts[12].score | 0.3261003792285919 |
| concepts[12].wikidata | https://www.wikidata.org/wiki/Q3966 |
| concepts[12].display_name | Computer hardware |
| concepts[13].id | https://openalex.org/C118524514 |
| concepts[13].level | 1 |
| concepts[13].score | 0.3253815472126007 |
| concepts[13].wikidata | https://www.wikidata.org/wiki/Q173212 |
| concepts[13].display_name | Computer architecture |
| concepts[14].id | https://openalex.org/C28855332 |
| concepts[14].level | 2 |
| concepts[14].score | 0.31982406973838806 |
| concepts[14].wikidata | https://www.wikidata.org/wiki/Q198099 |
| concepts[14].display_name | Quantization (signal processing) |
| concepts[15].id | https://openalex.org/C199833920 |
| concepts[15].level | 2 |
| concepts[15].score | 0.2984366714954376 |
| concepts[15].wikidata | https://www.wikidata.org/wiki/Q612536 |
| concepts[15].display_name | Vector quantization |
| concepts[16].id | https://openalex.org/C26517878 |
| concepts[16].level | 2 |
| concepts[16].score | 0.29369208216667175 |
| concepts[16].wikidata | https://www.wikidata.org/wiki/Q228039 |
| concepts[16].display_name | Key (lock) |
| concepts[17].id | https://openalex.org/C2776760102 |
| concepts[17].level | 3 |
| concepts[17].score | 0.2903432250022888 |
| concepts[17].wikidata | https://www.wikidata.org/wiki/Q5139990 |
| concepts[17].display_name | Code (set theory) |
| concepts[18].id | https://openalex.org/C57691317 |
| concepts[18].level | 2 |
| concepts[18].score | 0.28066372871398926 |
| concepts[18].wikidata | https://www.wikidata.org/wiki/Q1289248 |
| concepts[18].display_name | Scalar (mathematics) |
| concepts[19].id | https://openalex.org/C2781357197 |
| concepts[19].level | 2 |
| concepts[19].score | 0.27792441844940186 |
| concepts[19].wikidata | https://www.wikidata.org/wiki/Q5757597 |
| concepts[19].display_name | High memory |
| concepts[20].id | https://openalex.org/C206345919 |
| concepts[20].level | 2 |
| concepts[20].score | 0.26674115657806396 |
| concepts[20].wikidata | https://www.wikidata.org/wiki/Q20380951 |
| concepts[20].display_name | Resource (disambiguation) |
| concepts[21].id | https://openalex.org/C80444323 |
| concepts[21].level | 1 |
| concepts[21].score | 0.26656287908554077 |
| concepts[21].wikidata | https://www.wikidata.org/wiki/Q2878974 |
| concepts[21].display_name | Theoretical computer science |
| concepts[22].id | https://openalex.org/C176649486 |
| concepts[22].level | 3 |
| concepts[22].score | 0.25269007682800293 |
| concepts[22].wikidata | https://www.wikidata.org/wiki/Q2308807 |
| concepts[22].display_name | Memory management |
| keywords[0].id | https://openalex.org/keywords/lookup-table |
| keywords[0].score | 0.8553929328918457 |
| keywords[0].display_name | Lookup table |
| keywords[1].id | https://openalex.org/keywords/table |
| keywords[1].score | 0.5679431557655334 |
| keywords[1].display_name | Table (database) |
| keywords[2].id | https://openalex.org/keywords/bandwidth |
| keywords[2].score | 0.3962109386920929 |
| keywords[2].display_name | Bandwidth (computing) |
| keywords[3].id | https://openalex.org/keywords/software-deployment |
| keywords[3].score | 0.3701060116291046 |
| keywords[3].display_name | Software deployment |
| keywords[4].id | https://openalex.org/keywords/enhanced-data-rates-for-gsm-evolution |
| keywords[4].score | 0.3698514997959137 |
| keywords[4].display_name | Enhanced Data Rates for GSM Evolution |
| keywords[5].id | https://openalex.org/keywords/inference |
| keywords[5].score | 0.3684523105621338 |
| keywords[5].display_name | Inference |
| keywords[6].id | https://openalex.org/keywords/memory-bandwidth |
| keywords[6].score | 0.3683995008468628 |
| keywords[6].display_name | Memory bandwidth |
| keywords[7].id | https://openalex.org/keywords/edge-device |
| keywords[7].score | 0.36734962463378906 |
| keywords[7].display_name | Edge device |
| language | |
| locations[0].id | doi:10.48550/arxiv.2512.06443 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | cc-by |
| locations[0].pdf_url | |
| locations[0].version | |
| locations[0].raw_type | article |
| locations[0].license_id | https://openalex.org/licenses/cc-by |
| locations[0].is_accepted | False |
| locations[0].is_published | |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | https://doi.org/10.48550/arxiv.2512.06443 |
| indexed_in | datacite |
| authorships[0].author.id | https://openalex.org/A2013038159 |
| authorships[0].author.orcid | https://orcid.org/0009-0004-4358-3178 |
| authorships[0].author.display_name | Li Xiangyu |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Li, Xiangyu |
| authorships[0].is_corresponding | True |
| authorships[1].author.id | https://openalex.org/A2355064811 |
| authorships[1].author.orcid | |
| authorships[1].author.display_name | Yin Cheng-yu |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Yin, Chengyu |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A1423196259 |
| authorships[2].author.orcid | |
| authorships[2].author.display_name | Wang Wei-jun |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Wang, Weijun |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A2212456283 |
| authorships[3].author.orcid | |
| authorships[3].author.display_name | Wei Jianyu |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Wei, Jianyu |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A2163164736 |
| authorships[4].author.orcid | https://orcid.org/0000-0003-2189-7024 |
| authorships[4].author.display_name | Cao, Ting |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Cao, Ting |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A2350497426 |
| authorships[5].author.orcid | |
| authorships[5].author.display_name | Liu Yunxin |
| authorships[5].author_position | last |
| authorships[5].raw_author_name | Liu, Yunxin |
| authorships[5].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://doi.org/10.48550/arxiv.2512.06443 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-12-10T00:00:00 |
| display_name | Vec-LUT: Vector Table Lookup for Parallel Ultra-Low-Bit LLM Inference on Edge Devices |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-12-10T02:49:46.989445 |
| primary_topic | |
| cited_by_count | 0 |
| locations_count | 1 |
| best_oa_location.id | doi:10.48550/arxiv.2512.06443 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | cc-by |
| best_oa_location.pdf_url | |
| best_oa_location.version | |
| best_oa_location.raw_type | article |
| best_oa_location.license_id | https://openalex.org/licenses/cc-by |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | https://doi.org/10.48550/arxiv.2512.06443 |
| primary_location.id | doi:10.48550/arxiv.2512.06443 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | cc-by |
| primary_location.pdf_url | |
| primary_location.version | |
| primary_location.raw_type | article |
| primary_location.license_id | https://openalex.org/licenses/cc-by |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | https://doi.org/10.48550/arxiv.2512.06443 |
| publication_date | 2025-12-06 |
| publication_year | 2025 |
| referenced_works_count | 0 |
| abstract_inverted_index.3 | 147 |
| abstract_inverted_index.5 | 143 |
| abstract_inverted_index.a | 101, 107, 115 |
| abstract_inverted_index.$1 | 117 |
| abstract_inverted_index.N$ | 119 |
| abstract_inverted_index.To | 10, 93, 123 |
| abstract_inverted_index.at | 169 |
| abstract_inverted_index.by | 155 |
| abstract_inverted_index.is | 65, 78, 161, 167 |
| abstract_inverted_index.it | 125 |
| abstract_inverted_index.on | 7, 142 |
| abstract_inverted_index.to | 23, 157 |
| abstract_inverted_index.up | 156 |
| abstract_inverted_index.we | 97, 127 |
| abstract_inverted_index.(1) | 130 |
| abstract_inverted_index.(2) | 136 |
| abstract_inverted_index.LLM | 19 |
| abstract_inverted_index.LUT | 81, 109 |
| abstract_inverted_index.Our | 159 |
| abstract_inverted_index.The | 75, 165 |
| abstract_inverted_index.and | 26, 71, 86, 113, 135 |
| abstract_inverted_index.are | 4 |
| abstract_inverted_index.for | 47, 67, 90 |
| abstract_inverted_index.has | 17 |
| abstract_inverted_index.new | 45, 102 |
| abstract_inverted_index.now | 27 |
| abstract_inverted_index.per | 121 |
| abstract_inverted_index.run | 36 |
| abstract_inverted_index.the | 79, 95 |
| abstract_inverted_index.CPUs | 35 |
| abstract_inverted_index.LLMs | 39, 148 |
| abstract_inverted_index.LUT, | 100 |
| abstract_inverted_index.code | 166 |
| abstract_inverted_index.each | 91 |
| abstract_inverted_index.edge | 8, 144 |
| abstract_inverted_index.even | 40 |
| abstract_inverted_index.from | 21 |
| abstract_inverted_index.into | 163 |
| abstract_inverted_index.meet | 11 |
| abstract_inverted_index.root | 76 |
| abstract_inverted_index.show | 149 |
| abstract_inverted_index.than | 42 |
| abstract_inverted_index.that | 55, 105, 150 |
| abstract_inverted_index.this | 52 |
| abstract_inverted_index.with | 30 |
| abstract_inverted_index.8-bit | 22 |
| abstract_inverted_index.Large | 0 |
| abstract_inverted_index.NPUs, | 43 |
| abstract_inverted_index.cause | 77 |
| abstract_inverted_index.other | 72 |
| abstract_inverted_index.paper | 53 |
| abstract_inverted_index.solve | 94 |
| abstract_inverted_index.table | 32 |
| abstract_inverted_index.these | 37 |
| abstract_inverted_index.which | 64, 83 |
| abstract_inverted_index.(LLMs) | 3 |
| abstract_inverted_index.2-bit, | 25 |
| abstract_inverted_index.4-bit, | 24 |
| abstract_inverted_index.Lookup | 139 |
| abstract_inverted_index.Tensor | 133 |
| abstract_inverted_index.Vector | 131 |
| abstract_inverted_index.across | 110, 146 |
| abstract_inverted_index.during | 61 |
| abstract_inverted_index.faster | 41 |
| abstract_inverted_index.index. | 122 |
| abstract_inverted_index.issue, | 96 |
| abstract_inverted_index.lookup | 31, 103, 120 |
| abstract_inverted_index.memory | 59, 88 |
| abstract_inverted_index.models | 2 |
| abstract_inverted_index.pushed | 18 |
| abstract_inverted_index.scalar | 80 |
| abstract_inverted_index.single | 116 |
| abstract_inverted_index.strict | 12 |
| abstract_inverted_index.token. | 92 |
| abstract_inverted_index.vector | 99 |
| abstract_inverted_index.Layout, | 134 |
| abstract_inverted_index.Vec-LUT | 151 |
| abstract_inverted_index.devices | 145 |
| abstract_inverted_index.further | 128 |
| abstract_inverted_index.opening | 44 |
| abstract_inverted_index.propose | 98 |
| abstract_inverted_index.realize | 124 |
| abstract_inverted_index.tokens, | 112 |
| abstract_inverted_index.unified | 108 |
| abstract_inverted_index.Combined | 29 |
| abstract_inverted_index.However, | 51 |
| abstract_inverted_index.Streamed | 138 |
| abstract_inverted_index.accesses | 89 |
| abstract_inverted_index.deployed | 6 |
| abstract_inverted_index.devices. | 9 |
| abstract_inverted_index.language | 1 |
| abstract_inverted_index.paradigm | 104 |
| abstract_inverted_index.parallel | 62, 111 |
| abstract_inverted_index.performs | 84, 114 |
| abstract_inverted_index.required | 66 |
| abstract_inverted_index.resource | 13 |
| abstract_inverted_index.scaling, | 70 |
| abstract_inverted_index.1.58-bit. | 28 |
| abstract_inverted_index.LUT-based | 56 |
| abstract_inverted_index.available | 168 |
| abstract_inverted_index.bandwidth | 60 |
| abstract_inverted_index.baselines | 154 |
| abstract_inverted_index.inference | 57 |
| abstract_inverted_index.introduce | 129 |
| abstract_inverted_index.on-device | 49 |
| abstract_inverted_index.paradigm, | 82 |
| abstract_inverted_index.test-time | 69 |
| abstract_inverted_index.constructs | 106 |
| abstract_inverted_index.deployment | 16 |
| abstract_inverted_index.identifies | 54 |
| abstract_inverted_index.inference, | 34, 63 |
| abstract_inverted_index.integrated | 162 |
| abstract_inverted_index.llama.cpp. | 164 |
| abstract_inverted_index.real-world | 15 |
| abstract_inverted_index.repetitive | 85 |
| abstract_inverted_index.scenarios. | 74 |
| abstract_inverted_index.ubiquitous | 48 |
| abstract_inverted_index.(LUT)-based | 33 |
| abstract_inverted_index.Cache-Aware | 137 |
| abstract_inverted_index.Evaluations | 141 |
| abstract_inverted_index.LUT-Centric | 132 |
| abstract_inverted_index.\rightarrow | 118 |
| abstract_inverted_index.multi-token | 73 |
| abstract_inverted_index.outperforms | 152 |
| abstract_inverted_index.prefilling, | 68 |
| abstract_inverted_index.techniques. | 140 |
| abstract_inverted_index.$4.2\times$. | 158 |
| abstract_inverted_index.constraints, | 14 |
| abstract_inverted_index.efficiently, | 126 |
| abstract_inverted_index.increasingly | 5 |
| abstract_inverted_index.quantization | 20 |
| abstract_inverted_index.intelligence. | 50 |
| abstract_inverted_index.opportunities | 46 |
| abstract_inverted_index.ultra-low-bit | 38 |
| abstract_inverted_index.underutilizes | 58 |
| abstract_inverted_index.implementation | 160 |
| abstract_inverted_index.non-contiguous | 87 |
| abstract_inverted_index.state-of-the-art | 153 |
| abstract_inverted_index.https://github.com/Cipherxzc/vlut.cpp. | 170 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 6 |
| citation_normalized_percentile |