Snitch: A Tiny Pseudo Dual-Issue Processor for Area and Energy Efficient Execution of Floating-Point Intensive Workloads Article Swipe

View

Related Concepts

Computer science Pipeline (software) Efficient energy use Floating point Parallel computing Overhead (engineering) Flexibility (engineering) Control flow Instruction set Register file Embedded system Distributed computing Computer architecture Operating system Programming language Mathematics Electrical engineering Engineering Statistics

Florian Zaruba , Fabian Schuiki , Torsten Hoefler , Luca Benini ·

YOU? · · 2020 · Open Access · · DOI: https://doi.org/10.1109/tc.2020.3027900 · OA: W3013692244

Data-parallel applications, such as data analytics, machine learning, and\nscientific computing, are placing an ever-growing demand on floating-point\noperations per second on emerging systems. With increasing integration density,\nthe quest for energy efficiency becomes the number one design concern. While\ndedicated accelerators provide high energy efficiency, they are\nover-specialized and hard to adjust to algorithmic changes. We propose an\narchitectural concept that tackles the issues of achieving extreme energy\nefficiency while still maintaining high flexibility as a general-purpose\ncompute engine. The key idea is to pair a tiny 10kGE control core, called\nSnitch, with a double-precision FPU to adjust the compute to control ratio.\nWhile traditionally minimizing non-FPU area and achieving high floating-point\nutilization has been a trade-off, with Snitch, we achieve them both, by\nenhancing the ISA with two minimally intrusive extensions: stream semantic\nregisters (SSR) and a floating-point repetition instruction (FREP). SSRs allow\nthe core to implicitly encode load/store instructions as register reads/writes,\neliding many explicit memory instructions. The FREP extension decouples the\nfloating-point and integer pipeline by sequencing instructions from a\nmicro-loop buffer. These ISA extensions significantly reduce the pressure on\nthe core and free it up for other tasks, making Snitch and FPU effectively\ndual-issue at a minimal incremental cost of 3.2%. The two low overhead ISA\nextensions make Snitch more flexible than a contemporary vector processor lane,\nachieving a $2\\times$ energy-efficiency improvement. We have evaluated the\nproposed core and ISA extensions on an octa-core cluster in 22nm technology. We\nachieve more than $5\\times$ multi-core speed-up and a $3.5\\times$ gain in\nenergy efficiency on several parallel microkernels.\n