Joshua Mack
YOU?
Author Swipe
View article: RIMMS: Runtime Integrated Memory Management System for Heterogeneous Computing
RIMMS: Runtime Integrated Memory Management System for Heterogeneous Computing Open
Efficient memory management in heterogeneous systems is increasingly challenging due to diverse compute architectures (e.g., CPU, GPU, and FPGA) and dynamic task mappings not known at compile time. Existing approaches often require program…
View article: RIMMS: Runtime Integrated Memory Management System for Heterogeneous Computing
RIMMS: Runtime Integrated Memory Management System for Heterogeneous Computing Open
Efficient memory management in heterogeneous systems is increasingly challenging due to diverse compute architectures (e.g., CPU, GPU, FPGA) and dynamic task mappings not known at compile time. Existing approaches often require programmers…
View article: Coarse-Grained Task Parallelization by Dynamic Profiling for Heterogeneous SoC-Based Embedded System
Coarse-Grained Task Parallelization by Dynamic Profiling for Heterogeneous SoC-Based Embedded System Open
In this study, we introduce a methodology for automatically transforming user applications written in C/C++ to a parallel representation consisting of coarse-grained tasks based on dynamic profiling. Such a parallel representation is suita…
View article: Tutorial: A Novel Runtime Environment for Accelerator-Rich Heterogeneous Architectures
Tutorial: A Novel Runtime Environment for Accelerator-Rich Heterogeneous Architectures Open
As the landscape of computing advances, system designers are increasingly exploring methodologies that leverage higher levels of heterogeneity to enhance performance within constrained size, weight, power, and cost parameters. CEDR (Compil…
View article: GPU-RANC: A CUDA Accelerated Simulation Framework for Neuromorphic Architectures
GPU-RANC: A CUDA Accelerated Simulation Framework for Neuromorphic Architectures Open
Open-source simulation tools play a crucial role for neuromorphic application engineers and hardware architects to investigate performance bottlenecks and explore design optimizations before committing to silicon. Reconfigurable Architectu…
View article: Cyclebite: Extracting Task Graphs From Unstructured Compute-Programs
Cyclebite: Extracting Task Graphs From Unstructured Compute-Programs Open
—Extracting portable performance in an application requires structuring that program into a data-flow graph of coarse-grained tasks (CGTs). Structuring applications that interconnect multiple external libraries and custom code (i.e., “Code…
View article: CEDR-API: Productive, Performant Programming of Domain-Specific Embedded Systems
CEDR-API: Productive, Performant Programming of Domain-Specific Embedded Systems Open
As the computing landscape evolves, system designers continue to explore design methodologies that leverage increased levels of heterogeneity to push performance within limited size, weight, power, and cost budgets. One such methodology is…
View article: Profile-Guided Parallel Task Extraction and Execution for Domain Specific Heterogeneous SoC
Profile-Guided Parallel Task Extraction and Execution for Domain Specific Heterogeneous SoC Open
In this study, we introduce a methodology for automatically transforming user applications in the radar and communication domain written in C/C++ based on dynamic profiling to a parallel representation targeted for a heterogeneous SoC. We …
View article: A Hardware-based HEFT Scheduler Implementation for Dynamic Workloads on Heterogeneous SoCs
A Hardware-based HEFT Scheduler Implementation for Dynamic Workloads on Heterogeneous SoCs Open
Non-uniform performance and power consumption across the processing elements (PEs) of heterogeneous SoCs increase the computation complexity of the task scheduling problem compared to homogeneous architectures. Latency of a software-based …
View article: CEDR: A Compiler-integrated, Extensible DSSoC Runtime
CEDR: A Compiler-integrated, Extensible DSSoC Runtime Open
In this work, we present a C ompiler-integrated, E xtensible D omain Specific System on Chip R untime (CEDR) ecosystem to facilitate research toward addressing the challenges of architecture, system software, and application development wi…
View article: Performant, Multi-objective Scheduling of Highly Interleaved Task Graphs on Heterogeneous System on Chip Devices
Performant, Multi-objective Scheduling of Highly Interleaved Task Graphs on Heterogeneous System on Chip Devices Open
Performance-, power-, and energy-aware scheduling techniques play an essential role in optimally utilizing processing elements (PEs) of heterogeneous systems. List schedulers, a class of low-complexity static schedulers, have commonly been…
View article: RANC: Reconfigurable Architecture for Neuromorphic Computing
RANC: Reconfigurable Architecture for Neuromorphic Computing Open
Neuromorphic architectures have been introduced as platforms for energy\nefficient spiking neural network execution. The massive parallelism offered by\nthese architectures has also triggered interest from non-machine learning\napplication…
View article: FPGA Based Emulation Environment for Neuromorphic Architectures
FPGA Based Emulation Environment for Neuromorphic Architectures Open
Neuromorphic architectures such as IBM's TrueNorth and Intel's Loihi have been introduced as platforms for energy efficient spiking neural network execution. However, there is no framework that allows for rapidly experimenting with neuromo…
View article: User-Space Emulation Framework for Domain-Specific SoC Design
User-Space Emulation Framework for Domain-Specific SoC Design Open
In this work, we propose a portable, Linux-based emulation framework to provide an ecosystem for hardware-software co-design of Domain-specific SoCs (DSSoCs) and enable their rapid evaluation during the pre-silicon design phase. This frame…
View article: DS3: A System-Level Domain-Specific System-on-Chip Simulation Framework
DS3: A System-Level Domain-Specific System-on-Chip Simulation Framework Open
Heterogeneous systems-on-chip (SoCs) are highly favorable computing platforms due to their superior performance and energy efficiency potential compared to homogeneous architectures. They can be further tailored to a specific domain of app…
View article: Work-in-Progress: A Simulation Framework for Domain-Specific System-on-Chips
Work-in-Progress: A Simulation Framework for Domain-Specific System-on-Chips Open
Heterogeneous system-on-chips (SoCs) have become the standard embedded computing platforms due to their potential to deliver superior performance and energy efficiency compared to homogeneous architectures. They can be particularly suited …
View article: CORDIC-based Architecture for Powering Computation in Fixed-Point Arithmetic
CORDIC-based Architecture for Powering Computation in Fixed-Point Arithmetic Open
We present a fixed point architecture (source VHDL code is provided) for powering computation. The fully customized architecture, based on the expanded hyperbolic CORDIC algorithm, allows for design space exploration to establish trade-off…