Chi-Ying Tsui
YOU?
Author Swipe
View article: A Memory-Efficient Retrieval Architecture for RAG-Enabled Wearable Medical LLMs-Agents
A Memory-Efficient Retrieval Architecture for RAG-Enabled Wearable Medical LLMs-Agents Open
With powerful and integrative large language models (LLMs), medical AI agents have demonstrated unique advantages in providing personalized medical consultations, continuous health monitoring, and precise treatment plans. Retrieval-Augment…
View article: FedLAM: Low-latency Wireless Federated Learning via Layer-wise Adaptive Modulation
FedLAM: Low-latency Wireless Federated Learning via Layer-wise Adaptive Modulation Open
In wireless federated learning (FL), the clients need to transmit the high-dimensional deep neural network (DNN) parameters through bandwidth-limited channels, which causes the communication latency issue. In this paper, we propose a layer…
View article: STEM Education Development in Hong Kong and Its Impact to High School Students
STEM Education Development in Hong Kong and Its Impact to High School Students Open
View article: A Flexible Precision Scaling Deep Neural Network Accelerator with Efficient Weight Combination
A Flexible Precision Scaling Deep Neural Network Accelerator with Efficient Weight Combination Open
Deploying mixed-precision neural networks on edge devices is friendly to hardware resources and power consumption. To support fully mixed-precision neural network inference, it is necessary to design flexible hardware accelerators for cont…
View article: SynDCIM: A Performance-Aware Digital Computing-in-Memory Compiler with Multi-Spec-Oriented Subcircuit Synthesis
SynDCIM: A Performance-Aware Digital Computing-in-Memory Compiler with Multi-Spec-Oriented Subcircuit Synthesis Open
Digital Computing-in-Memory (DCIM) is an innovative technology that integrates multiply-accumulation (MAC) logic directly into memory arrays to enhance the performance of modern AI computing. However, the need for customized memory cells a…
View article: Partial Knowledge Distillation for Alleviating the Inherent Inter-Class Discrepancy in Federated Learning
Partial Knowledge Distillation for Alleviating the Inherent Inter-Class Discrepancy in Federated Learning Open
Substantial efforts have been devoted to alleviating the impact of the long-tailed class distribution in federated learning. In this work, we observe an interesting phenomenon that certain weak classes consistently exist even for class-bal…
View article: ReSCIM: Variation-Resilient High Weight-Loading Bandwidth In-Memory Computation Based on Fine-Grained Hybrid Integration of Multi-Level ReRAM and SRAM Cells
ReSCIM: Variation-Resilient High Weight-Loading Bandwidth In-Memory Computation Based on Fine-Grained Hybrid Integration of Multi-Level ReRAM and SRAM Cells Open
SRAM-CIM is a promising approach to implement efficient accelerator architecture as it enables accurate, energy-efficient AI computing, supporting both analog and digital computation. However, it has low area efficiency. On the other hand,…
View article: FedAQ: Communication-Efficient Federated Edge Learning via Joint Uplink and Downlink Adaptive Quantization
FedAQ: Communication-Efficient Federated Edge Learning via Joint Uplink and Downlink Adaptive Quantization Open
Federated learning (FL) is a powerful machine learning paradigm which leverages the data as well as the computational resources of clients, while protecting clients' data privacy. However, the substantial model size and frequent aggregatio…
View article: Energy-Efficient Channel Decoding for Wireless Federated Learning: Convergence Analysis and Adaptive Design
Energy-Efficient Channel Decoding for Wireless Federated Learning: Convergence Analysis and Adaptive Design Open
One of the most critical challenges for deploying distributed learning solutions, such as federated learning (FL), in wireless networks is the limited battery capacity of mobile clients. While it is a common belief that the major energy co…
View article: A Primer for Design and Systems Thinkers: A First-Year Engineering Course for Mindset Development
A Primer for Design and Systems Thinkers: A First-Year Engineering Course for Mindset Development Open
Teaching students to think in complex systems and design is presumably intricate, creative, and nonlinear. However, due to the overwhelming number of standardized tools and frameworks, the process sometimes ends up being procedural and ded…
View article: How Robust is Federated Learning to Communication Error? A Comparison Study Between Uplink and Downlink Channels
How Robust is Federated Learning to Communication Error? A Comparison Study Between Uplink and Downlink Channels Open
Because of its privacy-preserving capability, federated learning (FL) has attracted significant attention from both academia and industry. However, when being implemented over wireless networks, it is not clear how much communication error…
View article: Accelerating Large Kernel Convolutions with Nested Winograd Transformation
Accelerating Large Kernel Convolutions with Nested Winograd Transformation Open
Recent literature has shown that convolutional neural networks (CNNs) with\nlarge kernels outperform vision transformers (ViTs) and CNNs with stacked small\nkernels in many computer vision tasks, such as object detection and image\nrestora…
View article: Step-GRAND: A Low Latency Universal Soft-input Decoder
Step-GRAND: A Low Latency Universal Soft-input Decoder Open
GRAND features both soft-input and hard-input variants that are well suited to efficient hardware implementations that can be characterized with achievable average and worst-case decoding latency. This paper introduces step-GRAND, a soft-i…
View article: A 137.5 TOPS/W SRAM Compute-in-Memory Macro with 9-b Memory Cell-Embedded ADCs and Signal Margin Enhancement Techniques for AI Edge Applications
A 137.5 TOPS/W SRAM Compute-in-Memory Macro with 9-b Memory Cell-Embedded ADCs and Signal Margin Enhancement Techniques for AI Edge Applications Open
In this paper, we propose a high-precision SRAM-based CIM macro that can perform 4x4-bit MAC operations and yield 9-bit signed output. The inherent discharge branches of SRAM cells are utilized to apply time-modulated MAC and 9-bit ADC rea…
View article: Analysis and Prevention of Coupling-Dependent Data Flipping in Wireless Power Transfer Systems
Analysis and Prevention of Coupling-Dependent Data Flipping in Wireless Power Transfer Systems Open
Load shift keying (LSK) is widely used in a wireless power transfer (WPT) system to backscatter secondary side information to the primary side. However, when the coupling coefficient ( k ) between the transmitter and the receiver coils is …
View article: Tight Compression: Compressing CNN Through Fine-Grained Pruning and Weight Permutation for Efficient Implementation
Tight Compression: Compressing CNN Through Fine-Grained Pruning and Weight Permutation for Efficient Implementation Open
The unstructured sparsity after pruning poses a challenge to the efficient implementation of deep learning models in existing regular architectures like systolic arrays. On the other hand, coarse-grained structured pruning is suitable for …
View article: Energy-Efficient Dual-Node-Upset-Recoverable 12T SRAM for Low-Power Aerospace Applications
Energy-Efficient Dual-Node-Upset-Recoverable 12T SRAM for Low-Power Aerospace Applications Open
With technology scaling, transistor sizing, as well as the distance between them, is decreasing rapidly, thereby reducing the critical charge of sensitive nodes. This reduction makes SRAM cells, used for aerospace applications, more suscep…
View article: FedDQ: Communication-Efficient Federated Learning with Descending Quantization
FedDQ: Communication-Efficient Federated Learning with Descending Quantization Open
Federated learning (FL) is an emerging learning paradigm without violating users' privacy. However, large model size and frequent model aggregation cause serious communication bottleneck for FL. To reduce the communication volume, techniqu…
View article: Tight Compression: Compressing CNN Through Fine-Grained Pruning and Weight Permutation for Efficient Implementation
Tight Compression: Compressing CNN Through Fine-Grained Pruning and Weight Permutation for Efficient Implementation Open
The unstructured sparsity after pruning poses a challenge to the efficient implementation of deep learning models in existing regular architectures like systolic arrays. On the other hand, coarse-grained structured pruning is suitable for …
View article: A Reconfigurable Winograd CNN Accelerator with Nesting Decomposition Algorithm for Computing Convolution with Large Filters.
A Reconfigurable Winograd CNN Accelerator with Nesting Decomposition Algorithm for Computing Convolution with Large Filters. Open
Recent literature found that convolutional neural networks (CNN) with large filters perform well in some applications such as image semantic segmentation. Winograd transformation helps to reduce the number of multiplications in a convoluti…
View article: Polyimide-Based Flexible Coupled-Coils Design and Load-Shift Keying Analysis
Polyimide-Based Flexible Coupled-Coils Design and Load-Shift Keying Analysis Open
Wireless power transfer using inductive coupling is commonly used for medical implantable devices. The design of the secondary coil on the implantable device is important as it will affect the power transfer efficiency, the size of the imp…
View article: High Throughput Polar Decoding Using Two-Staged Adaptive Successive Cancellation List Decoding
High Throughput Polar Decoding Using Two-Staged Adaptive Successive Cancellation List Decoding Open
Polar codes are the first class of capacity-achieving forward error correction (FEC) codes. They have been selected as one of the coding schemes for the 5G communication systems due to their excellent error correction performance when succ…
View article: A Two-Staged Adaptive Successive Cancellation List Decoding for Polar Codes
A Two-Staged Adaptive Successive Cancellation List Decoding for Polar Codes Open
Polar codes achieve outstanding error correction performance when using successive cancellation list (SCL) decoding with cyclic redundancy check. A larger list size brings better decoding performance and is essential for practical applicat…
View article: CompRRAE
CompRRAE Open
Recently Resistive-RAM (RRAM) crossbar has been used in the design of the accelerator of convolutional neural networks (CNNs) to solve the memory wall issue. However, the intensive multiply-accumulate computations (MACs) executed at the cr…
View article: A −12.3 dBm UHF Passive RFID Sense Tag for Grid Thermal Monitoring
A −12.3 dBm UHF Passive RFID Sense Tag for Grid Thermal Monitoring Open
This paper presents an ultra-high-frequency (UHF) passive sense tag for electrical grid and substation thermal monitoring, with emphasis on the tag system optimization and the design of a low power embedded temperature sensor. The designed…
View article: Microshift: An Efficient Image Compression Algorithm for Hardware
Microshift: An Efficient Image Compression Algorithm for Hardware Open
In this paper, we propose a lossy image compression algorithm called Microshift. We employ an algorithm-hardware co-design methodology, yielding a hardware friendly compression approach with low power consumption. In our method, the image …
View article: A High-Throughput Architecture of List Successive Cancellation Polar Codes Decoder With Large List Size
A High-Throughput Architecture of List Successive Cancellation Polar Codes Decoder With Large List Size Open
As the first kind of forward error correction (FEC) codes that achieve channel capacity, polar codes have attracted much research interest recently. Compared with other popular FEC codes, polar codes decoded by list successive cancellation…
View article: On Path Memory in List Successive Cancellation Decoder of Polar Codes
On Path Memory in List Successive Cancellation Decoder of Polar Codes Open
Polar code is a breakthrough in coding theory. Using list successive cancellation decoding with large list size L, polar codes can achieve excellent error correction performance. The L partial decoded vectors are stored in the path memory …
View article: Optic Nerve Stimulation System with Adaptive Wireless Powering and Data Telemetry
Optic Nerve Stimulation System with Adaptive Wireless Powering and Data Telemetry Open
To treat retinal degenerative diseases, a transcorneal electrical stimulation-based system is proposed, which consists of an eye implant and an external component. The eye implant is wirelessly powered and controlled by the external compon…
View article: SparseNN: An Energy-Efficient Neural Network Accelerator Exploiting Input and Output Sparsity
SparseNN: An Energy-Efficient Neural Network Accelerator Exploiting Input and Output Sparsity Open
Contemporary Deep Neural Network (DNN) contains millions of synaptic connections with tens to hundreds of layers. The large computation and memory requirements pose a challenge to the hardware design. In this work, we leverage the intrinsi…