Zhenman Fang
YOU?
Author Swipe
View article: Towards Accurate and Efficient Sub-8-Bit Integer Training
Towards Accurate and Efficient Sub-8-Bit Integer Training Open
Neural network training is a memory- and compute-intensive task. Quantization, which enables low-bitwidth formats in training, can significantly mitigate the workload. To reduce quantization error, recent methods have developed new data fo…
View article: PASTA: Programming and Automation Support for Scalable Task-Parallel HLS Programs on Modern Multi-Die FPGAs
PASTA: Programming and Automation Support for Scalable Task-Parallel HLS Programs on Modern Multi-Die FPGAs Open
In recent years, the adoption of FPGAs in datacenters has increased, with a growing number of users choosing High-Level Synthesis (HLS) as their preferred programming method. While HLS simplifies FPGA programming, one notable challenge ari…
View article: WeConvene: Learned Image Compression with Wavelet-Domain Convolution and Entropy Model
WeConvene: Learned Image Compression with Wavelet-Domain Convolution and Entropy Model Open
Recently learned image compression (LIC) has achieved great progress and even outperformed the traditional approach using DCT or discrete wavelet transform (DWT). However, LIC mainly reduces spatial redundancy in the autoencoder networks a…
View article: Quasar-ViT: Hardware-Oriented Quantization-Aware Architecture Search for Vision Transformers
Quasar-ViT: Hardware-Oriented Quantization-Aware Architecture Search for Vision Transformers Open
Vision transformers (ViTs) have demonstrated their superior accuracy for computer vision tasks compared to convolutional neural networks (CNNs). However, ViT models are often computation-intensive for efficient deployment on resource-limit…
View article: HiSpMV: Hybrid Row Distribution and Vector Buffering for Imbalanced SpMV Acceleration on FPGAs
HiSpMV: Hybrid Row Distribution and Vector Buffering for Imbalanced SpMV Acceleration on FPGAs Open
Sparse matrix-vector multiplication (SpMV) is a fundamental operation in numerous applications such as scientific computing, machine learning, and graph analytics. While recent studies have made great progress in accelerating SpMV on HBM-e…
View article: Learned Image Compression with Dual-Branch Encoder and Conditional Information Coding
Learned Image Compression with Dual-Branch Encoder and Conditional Information Coding Open
Recent advancements in deep learning-based image compression are notable. However, prevalent schemes that employ a serial context-adaptive entropy model to enhance rate-distortion (R-D) performance are markedly slow. Furthermore, the compl…
View article: TAPA: A Scalable Task-parallel Dataflow Programming Framework for Modern FPGAs with Co-optimization of HLS and Physical Design
TAPA: A Scalable Task-parallel Dataflow Programming Framework for Modern FPGAs with Co-optimization of HLS and Physical Design Open
In this article, we propose TAPA, an end-to-end framework that compiles a C++ task-parallel dataflow program into a high-frequency FPGA accelerator. Compared to existing solutions, TAPA has two major advantages. First, TAPA provides a set …
View article: A Cycle-Accurate Soft Error Vulnerability Analysis Framework for FPGA-based Designs
A Cycle-Accurate Soft Error Vulnerability Analysis Framework for FPGA-based Designs Open
Many aerospace and automotive applications use FPGAs in their designs due to their low power and reconfigurability requirements. Meanwhile, such applications also pose a high standard on system reliability, which makes the early-stage reli…
View article: HeatViT: Hardware-Efficient Adaptive Token Pruning for Vision Transformers
HeatViT: Hardware-Efficient Adaptive Token Pruning for Vision Transformers Open
While vision transformers (ViTs) have continuously achieved new milestones in the field of computer vision, their sophisticated network architectures with high computation and memory costs have impeded their deployment on resource-limited …
View article: SuperYOLO: Super Resolution Assisted Object Detection in Multimodal Remote Sensing Imagery
SuperYOLO: Super Resolution Assisted Object Detection in Multimodal Remote Sensing Imagery Open
Accurately and timely detecting multiscale small objects that contain tens of pixels from remote sensing images (RSI) remains challenging. Most of the existing solutions primarily design complex deep neural networks to learn strong feature…
View article: TAPA: A Scalable Task-Parallel Dataflow Programming Framework for Modern FPGAs with Co-Optimization of HLS and Physical Design
TAPA: A Scalable Task-Parallel Dataflow Programming Framework for Modern FPGAs with Co-Optimization of HLS and Physical Design Open
In this paper, we propose TAPA, an end-to-end framework that compiles a C++ task-parallel dataflow program into a high-frequency FPGA accelerator. Compared to existing solutions, TAPA has two major advantages. First, TAPA provides a set of…
View article: SASA: A Scalable and Automatic Stencil Acceleration Framework for Optimized Hybrid Spatial and Temporal Parallelism on HBM-based FPGAs
SASA: A Scalable and Automatic Stencil Acceleration Framework for Optimized Hybrid Spatial and Temporal Parallelism on HBM-based FPGAs Open
Stencil computation is one of the fundamental computing patterns in many application domains such as scientific computing and image processing. While there are promising studies that accelerate stencils on FPGAs, there lacks an automated a…
View article: Auto-ViT-Acc: An FPGA-Aware Automatic Acceleration Framework for Vision Transformer with Mixed-Scheme Quantization
Auto-ViT-Acc: An FPGA-Aware Automatic Acceleration Framework for Vision Transformer with Mixed-Scheme Quantization Open
Vision transformers (ViTs) are emerging with significantly improved accuracy in computer vision tasks. However, their complex architecture and enormous computation/storage demand impose urgent needs for new hardware accelerator design meth…
View article: TopSort: A High-Performance Two-Phase Sorting Accelerator Optimized on HBM-based FPGAs
TopSort: A High-Performance Two-Phase Sorting Accelerator Optimized on HBM-based FPGAs Open
The emergence of high-bandwidth memory (HBM) brings new opportunities to boost the performance of sorting acceleration on FPGAs, which was conventionally bounded by the available off-chip memory bandwidth. However, it is nontrivial for des…
View article: Introduction to the Special Section on High-level Synthesis for FPGA: Next-generation Technologies and Applications
Introduction to the Special Section on High-level Synthesis for FPGA: Next-generation Technologies and Applications Open
No abstract available.
View article: FitAct: Error Resilient Deep Neural Networks via Fine-Grained Post-Trainable Activation Functions
FitAct: Error Resilient Deep Neural Networks via Fine-Grained Post-Trainable Activation Functions Open
Deep neural networks (DNNs) are increasingly being deployed in safety-critical systems such as personal healthcare devices and self-driving cars. In such DNN-based systems, error resilience is a top priority since faults in DNN inference c…
View article: Stealthy Attack on Algorithmic-Protected DNNs via Smart Bit Flipping
Stealthy Attack on Algorithmic-Protected DNNs via Smart Bit Flipping Open
Recently, deep neural networks (DNNs) have been deployed in safety-critical systems such as autonomous vehicles and medical devices. Shortly after that, the vulnerability of DNNs were revealed by stealthy adversarial examples where crafted…
View article: SeaPlace: Process Variation Aware Placement for Reliable Combinational Circuits against SETs and METs
SeaPlace: Process Variation Aware Placement for Reliable Combinational Circuits against SETs and METs Open
Nowadays nanoscale combinational circuits are facing significant reliability challenges including soft errors and process variations. This paper presents novel process variation-aware placement strategies that include two algorithms to inc…
View article: BDFA: A Blind Data Adversarial Bit-flip Attack on Deep Neural Networks
BDFA: A Blind Data Adversarial Bit-flip Attack on Deep Neural Networks Open
Adversarial bit-flip attack (BFA) on Neural Network weights can result in catastrophic accuracy degradation by flipping a very small number of bits. A major drawback of prior bit flip attack techniques is their reliance on test data. This …
View article: FPGA-based Near Data Processing Platform Selection Using Fast Performance Modeling (WiP Paper)
FPGA-based Near Data Processing Platform Selection Using Fast Performance Modeling (WiP Paper) Open
With the trend of adopting FPGAs in data centers, various FPGA acceleration platforms have been developed in recent years. Each server could incorporate one or many of these FPGAs at different compute hierarchy levels to match its workload…
View article: Best-Effort FPGA Programming: A Few Steps Can Go a Long Way
Best-Effort FPGA Programming: A Few Steps Can Go a Long Way Open
FPGA-based heterogeneous architectures provide programmers with the ability to customize their hardware accelerators for flexible acceleration of many workloads. Nonetheless, such advantages come at the cost of sacrificing programmability.…
View article: Revisiting FPGA Acceleration of Molecular Dynamics Simulation with Dynamic Data Flow Behavior in High-Level Synthesis
Revisiting FPGA Acceleration of Molecular Dynamics Simulation with Dynamic Data Flow Behavior in High-Level Synthesis Open
Molecular dynamics (MD) simulation is one of the past decade's most important tools for enabling biology scientists and researchers to explore human health and diseases. However, due to the computation complexity of the MD algorithm, it ta…
View article: ARAPrototyper: Enabling Rapid Prototyping and Evaluation for Accelerator-Rich Architectures
ARAPrototyper: Enabling Rapid Prototyping and Evaluation for Accelerator-Rich Architectures Open
Compared to conventional general-purpose processors, accelerator-rich architectures (ARAs) can provide orders-of-magnitude performance and energy gains and are emerging as one of the most promising solutions in the age of dark silicon. How…
View article: Programming and Runtime Support to Blaze FPGA Accelerator Deployment at Datacenter Scale
Programming and Runtime Support to Blaze FPGA Accelerator Deployment at Datacenter Scale Open
With the end of CPU core scaling due to dark silicon limitations, customized accelerators on FPGAs have gained increased attention in modern datacenters due to their lower power, high performance and energy efficiency. Evidenced by Microso…
View article: A quantitative analysis on microarchitectures of modern CPU-FPGA platforms
A quantitative analysis on microarchitectures of modern CPU-FPGA platforms Open
CPU-FPGA heterogeneous acceleration platforms have shown great potential for continued performance and energy efficiency improvement for modern data centers, and have captured great attention from both academia and industry. However, it is…