Paolo Ienne
YOU?
Author Swipe
View article: Out with LSQs: Custom Circuits for Memory Access Reordering in Dynamic HLS
Out with LSQs: Custom Circuits for Memory Access Reordering in Dynamic HLS Open
This repository contains the source code, benchmarks, and experimental results associated with the paper "Out with LSQs: Custom Circuits for Memory Access Reordering in Dynamic HLS" by Rouzbeh Pirayadi (EPFL), Ayatallah Elakhras (EPFL), Mi…
View article: Out with LSQs: Custom Circuits for Memory Access Reordering in Dynamic HLS
Out with LSQs: Custom Circuits for Memory Access Reordering in Dynamic HLS Open
This repository contains the source code, benchmarks, and experimental results associated with the paper "Out with LSQs: Custom Circuits for Memory Access Reordering in Dynamic HLS" by Rouzbeh Pirayadi (EPFL), Ayatallah Elakhras (EPFL), Mi…
View article: Survival of the Fastest: Enabling More Out-of-Order Execution in Dataflow Circuits
Survival of the Fastest: Enabling More Out-of-Order Execution in Dataflow Circuits Open
Dynamically scheduled HLS, through dataflow circuit generation, has proven successful at exploiting operation-level parallelism in several important situations where statically scheduled HLS fails. Yet, although existing dataflow circuits …
View article: Introduction to the Special Section on FPGA 2022
Introduction to the Special Section on FPGA 2022 Open
introduction Share on Introduction to the Special Section on FPGA 2022 Author: Paolo Ienne École Polytechnique Fédérale de Lausanne (EPFL), Switzerland École Polytechnique Fédérale de Lausanne (EPFL), Switzerland 0000-0002-6142-7345View Pr…
View article: Fast Parallel Algorithms for Enumeration of Simple, Temporal, and Hop-constrained Cycles
Fast Parallel Algorithms for Enumeration of Simple, Temporal, and Hop-constrained Cycles Open
Cycles are one of the fundamental subgraph patterns and being able to enumerate them in graphs enables important applications in a wide variety of fields, including finance, biology, chemistry, and network science. However, to enable cycle…
View article: Resource Sharing in Dataflow Circuits
Resource Sharing in Dataflow Circuits Open
To achieve resource-efficient hardware designs, high-level synthesis (HLS) tools share (i.e., time-multiplex) functional units among operations of the same type. This optimization is typically performed in conjunction with operation schedu…
View article: Exploring FPGA Switch-Blocks without Explicitly Listing Connectivity Patterns
Exploring FPGA Switch-Blocks without Explicitly Listing Connectivity Patterns Open
Increased lower metal resistance makes physical aspects of Field-Programmable Gate Array (FPGA) switch-blocks more relevant than before. The need to navigate a design space where each individual switch can have significant impact on the FP…
View article: Straight to the Queue: Fast Load-Store Queue Allocation in Dataflow Circuits
Straight to the Queue: Fast Load-Store Queue Allocation in Dataflow Circuits Open
Dynamically scheduled high-level synthesis can exploit high levels of parallelism in poorly-predictable control-dominated applications. Yet, dataflow circuits are often generated by literal conversion of basic blocks into circuits intercon…
View article: Regularity Matters: Designing Practical FPGA Switch-Blocks
Regularity Matters: Designing Practical FPGA Switch-Blocks Open
Several techniques have been proposed for automatically searching for FPGA switch-blocks which typically show some tangible advantage over the well-known academic architectures. However, the resulting switch-blocks usually exhibit high lev…
View article: Fast Parallel Algorithms for Enumeration of Simple, Temporal, and Hop-Constrained Cycles
Fast Parallel Algorithms for Enumeration of Simple, Temporal, and Hop-Constrained Cycles Open
Cycles are one of the fundamental subgraph patterns and being able to enumerate them in graphs enables important applications in a wide variety of fields, including finance, biology, chemistry, and network science. However, to enable cycle…
View article: A Comprehensive Timing Model for Accurate Frequency Tuning in Dataflow Circuits
A Comprehensive Timing Model for Accurate Frequency Tuning in Dataflow Circuits Open
The ability of dataflow circuits to implement dynamic scheduling promises to overcome the conservatism of static scheduling techniques that high-level synthesis tools typically rely on. Yet, the same distributed control mechanism that allo…
View article: Unleashing Parallelism in Elastic Circuits with Faster Token Delivery
Unleashing Parallelism in Elastic Circuits with Faster Token Delivery Open
High-level synthesis (HLS) is the process of automatically generating circuits out of high-level language descriptions. Previous research has shown that dynamically scheduled HLS through elastic circuit generation is successful at exploiti…
View article: Scalable Fine-Grained Parallel Cycle Enumeration Algorithms
Scalable Fine-Grained Parallel Cycle Enumeration Algorithms Open
Enumerating simple cycles has important applications in computational\nbiology, network science, and financial crime analysis. In this work, we focus\non parallelising the state-of-the-art simple cycle enumeration algorithms by\nJohnson an…
View article: Resource Sharing in Dataflow Circuits
Resource Sharing in Dataflow Circuits Open
To achieve resource-efficient hardware designs, HLS tools share (i.e., time-multiplex) functional units among operations of the same type. This optimization is typically performed together with operation scheduling to ensure the best possi…
View article: Detailed Placement for Dedicated LUT-Level FPGA Interconnect
Detailed Placement for Dedicated LUT-Level FPGA Interconnect Open
In this work, we develop timing-driven CAD support for FPGA architectures with direct connections between LUTs. We do so by proposing an efficient ILP-based detailed placer, which moves a carefully selected subset of LUTs from their origin…
View article: Request, Coalesce, Serve, and Forget: Miss-Optimized Memory Systems for Bandwidth-Bound Cache-Unfriendly Applications on FPGAs
Request, Coalesce, Serve, and Forget: Miss-Optimized Memory Systems for Bandwidth-Bound Cache-Unfriendly Applications on FPGAs Open
Applications such as large-scale sparse linear algebra and graph analytics are challenging to accelerate on FPGAs due to the short irregular memory accesses, resulting in low cache hit rates. Nonblocking caches reduce the bandwidth require…
View article: Buffer Placement and Sizing for High-Performance Dataflow Circuits
Buffer Placement and Sizing for High-Performance Dataflow Circuits Open
Commercial high-level synthesis tools typically produce statically scheduled circuits. Yet, effective C-to-circuit conversion of arbitrary software applications calls for dataflow circuits, as they can handle efficiently variable latencies…
View article: How Many CPU Cores is an FPGA Worth? Lessons Learned from Accelerating String Sorting on a CPU-FPGA System
How Many CPU Cores is an FPGA Worth? Lessons Learned from Accelerating String Sorting on a CPU-FPGA System Open
String sorting is a fundamental kernel of string matching and database index construction; yet, it has not been studied as extensively as fixed-length keys sorting. Because processing variable-length keys in hardware is challenging, it is …
View article: From C/C++ Code to High-Performance Dataflow Circuits
From C/C++ Code to High-Performance Dataflow Circuits Open
High-level synthesis (HLS) tools typically generate statically scheduled datapaths. Static scheduling implies that the resulting circuits have a hard time exploiting parallelism in code with potential memory dependences, with control depen…
View article: Large-Scale Graph Processing on FPGAs with Caches for Thousands of Simultaneous Misses
Large-Scale Graph Processing on FPGAs with Caches for Thousands of Simultaneous Misses Open
Efficient large-scale graph processing is crucial to many disciplines. Yet, while graph algorithms naturally expose massive parallelism opportunities, their performance is limited by the memory system because of irregular memory accesses. …
View article: DASS: Combining Dynamic & Static Scheduling in High-Level Synthesis
DASS: Combining Dynamic & Static Scheduling in High-Level Synthesis Open
A central task in high-level synthesis is scheduling: the allocation of operations to clock cycles. The classic approach to scheduling is static, in which each operation is mapped to a clock cycle at compile-time, but recent years have see…
View article: Global Is the New Local: FPGA Architecture at 5nm and Beyond
Global Is the New Local: FPGA Architecture at 5nm and Beyond Open
It takes only high-school physics to appreciate that the resistance of a wire grows with a diminishing cross section, and a quick look at any plot about Moore's law immediately suggests that such cross section must decrease over time. Clea…
View article: Resource Sharing in Dataflow Circuits
Resource Sharing in Dataflow Circuits Open
To achieve resource-efficient hardware designs, high-level synthesis tools share functional units among operations of the same type. This optimization is typically performed in conjunction with operation scheduling to ensure the best possi…
View article: Synthesizing General-Purpose Code Into Dynamically Scheduled Circuits
Synthesizing General-Purpose Code Into Dynamically Scheduled Circuits Open
Since their inception more than thirty years ago, field-programmable gate arrays (FPGAs) have been widely used to implement a myriad of applications from different domains. As a result of their low-level hardware reconfigurability, FPGAs h…
View article: Timing-Driven Placement for FPGA Architectures with Dedicated Routing Paths
Timing-Driven Placement for FPGA Architectures with Dedicated Routing Paths Open
The idea of introducing dedicated, fast paths between certain FPGA elements in order to reduce delay is neither new nor particularly hard to come up with. What is less obvious, however, is how to put such paths to actual use. In this work,…
View article: FPGAs in the Datacenters: the Case of Parallel Hybrid Super Scalar String Sample Sort
FPGAs in the Datacenters: the Case of Parallel Hybrid Super Scalar String Sample Sort Open
String sorting is an important part of database and MapReduce applications; however, it has not been studied as extensively as sorting of fixed-length keys. Handling variable-length keys in hardware is challenging and it is no surprise tha…
View article: Manycore clique enumeration with fast set intersections
Manycore clique enumeration with fast set intersections Open
Listing all maximal cliques of a given graph has important applications in the analysis of social and biological networks. Parallelisation of maximal clique enumeration (MCE) algorithms on modern manycore processors is challenging due to t…