Dataflow architecture
View article
FlowSense: A Natural Language Interface for Visual Data Exploration within a Dataflow System Open
Dataflow visualization systems enable flexible visual data exploration by allowing the user to construct a dataflow diagram that composes query and visualization modules to specify system functionality. However learning dataflow diagram us…
View article
Accelerating Scientific Applications With SambaNova Reconfigurable Dataflow Architecture Open
Here, our exploratory work finds that the SambaNova Reconfigurable Dataflow Architecture (RDA) along with the SambaFlow software stack provides for an attractive system and solution to accelerate AI for science workloads. We have observed …
View article
Dataflow Management in the Internet of Things: Sensing, Control, and Security Open
The pervasiveness of the smart Internet of Things (IoTs) enables many electric sensors and devices to be connected and generates a large amount of dataflow. Compared with traditional big data, the streaming dataflow is faced with represent…
View article
FLAT: An Optimized Dataflow for Mitigating Attention Bottlenecks Open
Attention mechanisms, primarily designed to capture pairwise correlations between words, have become the backbone of machine learning, expanding beyond natural language processing into other domains. This growth in adaptation comes at the …
View article
Spada: Accelerating Sparse Matrix Multiplication with Adaptive Dataflow Open
Sparse matrix-matrix multiplication (SpGEMM) is widely used in many scientific and deep learning applications. The highly irregular structures of SpGEMM limit its performance and efficiency on conventional computation platforms, and thus m…
View article
Buffer Placement and Sizing for High-Performance Dataflow Circuits Open
Commercial high-level synthesis tools typically produce statically scheduled circuits. Yet, effective C-to-circuit conversion of arbitrary software applications calls for dataflow circuits, as they can handle efficiently variable latencies…
View article
Decaf: Decoupled Dataflows for In Situ High-Performance Workflows Open
Decaf is a dataflow system for the parallel communication of coupled tasks in an HPC workflow. The dataflow can perform arbitrary data transformations ranging from simply forwarding data to complex data redistribution. Decaf does this by a…
View article
Incremental, iterative data processing with timely dataflow Open
We describe the timely dataflow model for distributed computation and its implementation in the Naiad system. The model supports stateful iterative and incremental computations. It enables both low-latency stream processing and high-throug…
View article
Explaining outputs in modern data analytics Open
We report on the design and implementation of a general framework for interactively explaining the outputs of modern data-parallel computations, including iterative data analytics. To produce explanations, existing works adopt a naive back…
View article
The Sparse Abstract Machine Open
We propose the Sparse Abstract Machine (SAM), an abstract machine model for targeting sparse tensor algebra to reconfigurable and fixed-function spatial dataflow accelerators. SAM defines a streaming dataflow abstraction with sparse primit…
View article
From functional programs to pipelined dataflow circuits Open
We present a translation from programs expressed in a functional IR into dataflow networks as an intermediate step within a Haskell-to-Hardware compiler. Our networks exploit pipeline parallelism, particularly across multiple tail-recursiv…
View article
Speculative Dataflow Circuits Open
With FPGAs facing broader application domains, the conversion of imperative languages into dataflow circuits has been recently revamped as a way to overcome the conservatism of statically scheduled high-level synthesis. Apart from the abil…
View article
Execution and Cache Performance of the Scheduled Dataflow Architecture Open
This paper presents an evaluation of our Scheduled Dataflow (SDF) Processor. Recent focus in the field of new processor architectures is mainly on VLIW (e.g. IA-64), superscalar and superspeculative architectures. This trend allows for bet…
View article
Roofline-Model-Based Design Space Exploration for Dataflow Techniques of CNN Accelerators Open
To effectively compute convolutional layers, a complex design space must exist (e.g., the dataflow techniques associated with the layer parameters, loop transformation techniques, and hardware parameters). For efficient design space explor…
View article
Energy-Efficient High-Speed ASIC Implementation of Convolutional Neural Network Using Novel Reduced Critical-Path Design Open
Convolutional Neural Network (CNN) plays an important role in several machine learning tasks related to speech, image, and video processing applications. The increasing demand for faster processing in real-time applications requires high-s…
View article
Towards Collaborative Optimization of Cluster Configurations for Distributed Dataflow Jobs Open
Analyzing large datasets with distributed dataflow systems requires the use of clusters. Public cloud providers offer a large variety and quantity of resources that can be used for such clusters. However, picking the appropriate resources …
View article
MPNA: A Massively-Parallel Neural Array Accelerator with Dataflow Optimization for Convolutional Neural Networks Open
The state-of-the-art accelerators for Convolutional Neural Networks (CNNs) typically focus on accelerating only the convolutional layers, but do not prioritize the fully-connected layers much. Hence, they lack a synergistic optimization of…
View article
SnailTrail: Generalizing Critical Paths for Online Analysis of Distributed Dataflows Open
We rigorously generalize critical path analysis (CPA) to long-running and streaming computations and present SnailTrail, a system built on Timely Dataflow, which applies our analysis to a range of popular distributed dataflow engines. Our …
View article
A Comprehensive Timing Model for Accurate Frequency Tuning in Dataflow Circuits Open
The ability of dataflow circuits to implement dynamic scheduling promises to overcome the conservatism of static scheduling techniques that high-level synthesis tools typically rely on. Yet, the same distributed control mechanism that allo…
View article
PAPIFY: Automatic Instrumentation and Monitoring of Dynamic Dataflow Applications Based on PAPI Open
International audience
View article
RTL-Aware Dataflow-Driven Macro Placement Open
© 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,crea…
View article
Eliminating Excessive Dynamism of Dataflow Circuits Using Model Checking Open
Recent HLS efforts explore the generation of dynamically scheduled, dataflow circuits from high-level code; their ability to adapt the schedule at runtime to particular data and control outcomes promises superior performance to standard, s…
View article
DPACS: Hardware Accelerated Dynamic Neural Network Pruning through Algorithm-Architecture Co-design Open
By eliminating compute operations intelligently based on the run time input, dynamic pruning (DP) promises to improve deep neural network inference speed substantially without incurring a major impact on their accuracy. Although many DP al…
View article
Straight to the Queue: Fast Load-Store Queue Allocation in Dataflow Circuits Open
Dynamically scheduled high-level synthesis can exploit high levels of parallelism in poorly-predictable control-dominated applications. Yet, dataflow circuits are often generated by literal conversion of basic blocks into circuits intercon…
View article
Two Fundamental Issues in Multiprocessing: The Dataflow Solution Open
To exploit the parallelism inherent in algorithms, any multiprocessor system must address two very basic issues - long memory latencies and waits for synchronization events. It is argued on the basis of the evolution of high performance co…
View article
Multi-Core Dataflow Design and Implementation of Secure Hash Algorithm-3 Open
Embedded multi-core systems are implemented as systems-on-chip that rely on packet storeand-forward networks-on-chip for communications. These systems do not use buses or global clock. Instead routers are used to move data between the core…
View article
A Composable Dynamic Sparse Dataflow Architecture for Efficient Event-based Vision Processing on FPGA Open
Event-based vision represents a paradigm shift in how vision information is captured and processed. By only responding to dynamic intensity changes in the scene, event-based sensing produces far less data than conventional frame-based came…
View article
Optimal Dataflow Scheduling on a Heterogeneous Multiprocessor With Reduced Response Time Bounds Open
Heterogeneous computing platforms with multiple types of computing resources have been widely used in many industrial systems to process dataflow tasks with pre-defined affinity of tasks to subgroups of resources. For many dataflow workloa…
View article
Resource Sharing in Dataflow Circuits Open
To achieve resource-efficient hardware designs, HLS tools share (i.e., time-multiplex) functional units among operations of the same type. This optimization is typically performed together with operation scheduling to ensure the best possi…
View article
Enel: Context-Aware Dynamic Scaling of Distributed Dataflow Jobs using Graph Propagation Open
Distributed dataflow systems like Spark and Flink enable the use of clusters\nfor scalable data analytics. While runtime prediction models can be used to\ninitially select appropriate cluster resources given target runtimes, the\nactual ru…