An Open-Source Framework for Efficient Numerically-Tailored Computations Article Swipe

View

Related Concepts

Open source Computer science Computation Open source software Computational science Distributed computing Algorithm Programming language Software

Louis V. Ledoux , Marc Casas ·

YOU? · · 2023 · Open Access · · DOI: https://doi.org/10.1109/fpl60245.2023.00011 · OA: W4388214817

We present a versatile open-source framework designed to facilitate\nefficient, numerically-tailored Matrix-Matrix Multiplications (MMMs). The\nframework offers two primary contributions: first, a fine-tuned, automated\npipeline for arithmetic datapath generation, enabling highly customizable\nsystolic MMM kernels; second, seamless integration of the generated kernels\ninto user code, irrespective of the programming language employed, without\nnecessitating modifications.\n The framework demonstrates a systematic enhancement in accuracy per energy\ncost across diverse High Performance Computing (HPC) workloads displaying a\nvariety of numerical requirements, such as Artificial Intelligence (AI)\ninference and Sea Surface Height (SSH) computation. For AI inference, we\nconsider a set of state-of-the-art neural network models, namely ResNet18,\nResNet34, ResNet50, DenseNet121, DenseNet161, DenseNet169, and VGG11, in\nconjunction with two datasets, two computer formats, and 27 distinct\nintermediate arithmetic datapaths. Our approach consistently reduces energy\nconsumption across all cases, with a notable example being the reduction by\nfactors of $3.3\\times$ for IEEE754-32 and $1.4\\times$ for Bfloat16 during\nImageNet inference with ResNet50. This is accomplished while maintaining\naccuracies of $82.3\\%$ and $86\\%$, comparable to those achieved with\nconventional Floating-Point Units (FPUs). In the context of SSH computation,\nour method achieves fully-reproducible results using double-precision words,\nsurpassing the accuracy of conventional double- and quad-precision arithmetic\nin FPUs. Our approach enhances SSH computation accuracy by a minimum of\n$5\\times$ and $27\\times$ compared to IEEE754-64 and IEEE754-128, respectively,\nresulting in $5.6\\times$ and $15.1\\times$ improvements in accuracy per power\ncost.\n