arXiv (Cornell University)
NeUQI: Near-Optimal Uniform Quantization Parameter Initialization
May 2025 • Lin Li, Xinyu Hu, Xiaojun Wan
Large language models (LLMs) achieve impressive performance across domains but face significant challenges when deployed on consumer-grade GPUs or personal devices such as laptops, due to high memory consumption and inference costs. Post-training quantization (PTQ) of LLMs offers a promising solution that reduces their memory footprint and decoding latency. In practice, PTQ with uniform quantization representation is favored for its efficiency and ease of deployment since uniform quantization is widely supported b…