arXiv (Cornell University)
Trainable Fixed-Point Quantization for Deep Learning Acceleration on FPGAs
January 2024 • Dingyi Dai, Yichi Zhang, Jiahao Zhang, Zhanqiu Hu, Yaohui Cai, Qi Sun, Zhiru Zhang
Quantization is a crucial technique for deploying deep learning models on resource-constrained devices, such as embedded FPGAs. Prior efforts mostly focus on quantizing matrix multiplications, leaving other layers like BatchNorm or shortcuts in floating-point form, even though fixed-point arithmetic is more efficient on FPGAs. A common practice is to fine-tune a pre-trained model to fixed-point for FPGA deployment, but potentially degrading accuracy. This work presents QFX, a novel trainable fixed-point quantizati…