Mehdi Kamal
YOU?
Author Swipe
View article: SCE-NTT: A Hardware Accelerator for Number Theoretic Transform Using Superconductor Electronics
SCE-NTT: A Hardware Accelerator for Number Theoretic Transform Using Superconductor Electronics Open
This research explores the use of superconductor electronics (SCE) for accelerating fully homomorphic encryption (FHE), focusing on the Number-Theoretic Transform (NTT), a key computational bottleneck in FHE schemes. We present SCE-NTT, a …
View article: MARCO: Hardware-Aware Neural Architecture Search for Edge Devices with Multi-Agent Reinforcement Learning and Conformal Prediction Filtering
MARCO: Hardware-Aware Neural Architecture Search for Edge Devices with Multi-Agent Reinforcement Learning and Conformal Prediction Filtering Open
This paper introduces MARCO (Multi-Agent Reinforcement learning with Conformal Optimization), a novel hardware-aware framework for efficient neural architecture search (NAS) targeting resource-constrained edge devices. By significantly red…
View article: FAIR-SIGHT: Fairness Assurance in Image Recognition via Simultaneous Conformal Thresholding and Dynamic Output Repair
FAIR-SIGHT: Fairness Assurance in Image Recognition via Simultaneous Conformal Thresholding and Dynamic Output Repair Open
We introduce FAIR-SIGHT, an innovative post-hoc framework designed to ensure fairness in computer vision systems by combining conformal prediction with a dynamic output repair mechanism. Our approach calculates a fairness-aware non-conform…
View article: RocketPPA: Code-Level Power, Performance, and Area Prediction via LLM and Mixture of Experts
RocketPPA: Code-Level Power, Performance, and Area Prediction via LLM and Mixture of Experts Open
This paper presents RocketPPA, a novel ultra-fast power, performance (delay), and area (PPA) estimator operating directly at the code-level abstraction using HDL code as input. The key technical innovation is its LLM-based regression model…
View article: IC-D2S: A Hybrid Ising-Classical-Machines Data-Driven QUBO Solver Method
IC-D2S: A Hybrid Ising-Classical-Machines Data-Driven QUBO Solver Method Open
We present a heuristic algorithm designed to solve Quadratic Unconstrained Binary Optimization (QUBO) problems efficiently. The algorithm, referred to as IC-D2S, leverages a hybrid approach using Ising and classical machines to address ver…
View article: FACTER: Fairness-Aware Conformal Thresholding and Prompt Engineering for Enabling Fair LLM-Based Recommender Systems
FACTER: Fairness-Aware Conformal Thresholding and Prompt Engineering for Enabling Fair LLM-Based Recommender Systems Open
We propose FACTER, a fairness-aware framework for LLM-based recommendation systems that integrates conformal prediction with dynamic prompt engineering. By introducing an adaptive semantic variance threshold and a violation-triggered mecha…
View article: Scalable superconductor neuron with ternary synaptic connections for ultra-fast SNN hardware
Scalable superconductor neuron with ternary synaptic connections for ultra-fast SNN hardware Open
A novel high-fan-in differential superconductor neuron structure designed for ultra-high-performance spiking neural network (SNN) accelerators is presented. Utilizing a high-fan-in neuron structure allows us to design SNN accelerators with…
View article: SAIM: Scalable Analog Ising Machine for Solving Quadratic Binary Optimization Problems
SAIM: Scalable Analog Ising Machine for Solving Quadratic Binary Optimization Problems Open
This paper presents a CMOS-compatible Lechner-Hauke-Zoller (LHZ)--based analog tile structure as a fundamental unit for developing scalable analog Ising machines (IMs). In the designed LHZ tile, the voltage-controlled oscillators are emplo…
View article: MENAGE: Mixed-Signal Event-Driven Neuromorphic Accelerator for Edge Applications
MENAGE: Mixed-Signal Event-Driven Neuromorphic Accelerator for Edge Applications Open
This paper presents a mixed-signal neuromorphic accelerator architecture designed for accelerating inference with event-based neural network models. This fully CMOS-compatible accelerator utilizes analog computing to emulate synapse and ne…
View article: Efficient Noise Mitigation for Enhancing Inference Accuracy in DNNs on Mixed-Signal Accelerators
Efficient Noise Mitigation for Enhancing Inference Accuracy in DNNs on Mixed-Signal Accelerators Open
In this paper, we propose a framework to enhance the robustness of the neural models by mitigating the effects of process-induced and aging-related variations of analog computing components on the accuracy of the analog neural networks. We…
View article: On the Impact of ISA Extension on Energy Consumption of I-Cache in Extensible Processors
On the Impact of ISA Extension on Energy Consumption of I-Cache in Extensible Processors Open
As is widely known, the computational speed and power consumption are two critical parameters in microprocessor design. A solution for these issues is the application specific instruction set processor (ASIP) methodology, which can improve…
View article: Enhancing Layout Hotspot Detection Efficiency with YOLOv8 and PCA-Guided Augmentation
Enhancing Layout Hotspot Detection Efficiency with YOLOv8 and PCA-Guided Augmentation Open
In this paper, we present a YOLO-based framework for layout hotspot detection, aiming to enhance the efficiency and performance of the design rule checking (DRC) process. Our approach leverages the YOLOv8 vision model to detect multiple ho…
View article: Dynamic Co-Optimization Compiler: Leveraging Multi-Agent Reinforcement Learning for Enhanced DNN Accelerator Performance
Dynamic Co-Optimization Compiler: Leveraging Multi-Agent Reinforcement Learning for Enhanced DNN Accelerator Performance Open
This paper introduces a novel Dynamic Co-Optimization Compiler (DCOC), which employs an adaptive Multi-Agent Reinforcement Learning (MARL) framework to enhance the efficiency of mapping machine learning (ML) models, particularly Deep Neura…
View article: Scalable Superconductor Neuron with Ternary Synaptic Connections for Ultra-Fast SNN Hardware
Scalable Superconductor Neuron with Ternary Synaptic Connections for Ultra-Fast SNN Hardware Open
A novel high-fan-in differential superconductor neuron structure designed for ultra-high-performance Spiking Neural Network (SNN) accelerators is presented. Utilizing a high-fan-in neuron structure allows us to design SNN accelerators with…
View article: Unsupervised SFQ-Based Spiking Neural Network
Unsupervised SFQ-Based Spiking Neural Network Open
Single Flux Quantum (SFQ) technology represents a groundbreaking advancement\nin computational efficiency and ultra-high-speed neuromorphic processing. The\nkey features of SFQ technology, particularly data representation, transmission,\na…
View article: Low-Precision Mixed-Computation Models for Inference on Edge
Low-Precision Mixed-Computation Models for Inference on Edge Open
This paper presents a mixed-computation neural network processing approach for edge applications that incorporates low-precision (low-width) Posit and low-precision fixed point (FixP) number systems. This mixed-computation approach employs…
View article: A Josephson Parametric Oscillator-Based Ising Machine
A Josephson Parametric Oscillator-Based Ising Machine Open
Ising machines have emerged as a promising solution for rapidly solving NP-complete combinatorial optimization problems, surpassing the capabilities of traditional computing methods. By efficiently determining the ground state of the Hamil…
View article: ReMeCo
ReMeCo Open
Memristor-based in-memory neuromorphic computing systems promise a highly efficient implementation of vector-matrix multiplications, commonly used in artificial neural networks (ANNs). However, the immature fabrication process of memristor…
View article: Accuracy Configurable Adders with Negligible Delay Overhead in Exact Operating Mode
Accuracy Configurable Adders with Negligible Delay Overhead in Exact Operating Mode Open
In this paper, two accuracy configurable adders capable of operating in approximate and exact modes are proposed. In the adders, which include a block-based carry propagate and a parallel prefix structure, the carry chains are cut off in t…
View article: AMR-MUL: An Approximate Maximally Redundant Signed Digit Multiplier
AMR-MUL: An Approximate Maximally Redundant Signed Digit Multiplier Open
In this paper, we present an energy-efficient, yet high-speed approximate maximally redundant signed digit (MRSD) multiplier (called AMR-MUL) based on a parallel structure. For the reduction stage, we suggest several approximate Full-Adder…
View article: Heterogeneous Multi-core Array-based DNN Accelerator
Heterogeneous Multi-core Array-based DNN Accelerator Open
In this article, we investigate the impact of architectural parameters of array-based DNN accelerators on accelerator's energy consumption and performance in a wide variety of network topologies. For this purpose, we have developed a tool …
View article: Selection of Energy Upgrades for Canadian Single-Detached Residential Households Based on Occupancy Profile
Selection of Energy Upgrades for Canadian Single-Detached Residential Households Based on Occupancy Profile Open
The use of energy efficient building systems can play a key role in reducing energy consumption and the adverse impacts of greenhouse gas (GHG) emission. The occupancy profile of residential dwellings has a notable influence on the effecti…
View article: A2P-MANN: Adaptive Attention Inference Hops Pruned Memory-Augmented Neural Networks
A2P-MANN: Adaptive Attention Inference Hops Pruned Memory-Augmented Neural Networks Open
In this work, to limit the number of required attention inference hops in memory-augmented neural networks, we propose an online adaptive approach called A2P-MANN. By exploiting a small neural network classifier, an adequate number of atte…
View article: BRDS: An FPGA-based LSTM Accelerator with Row-Balanced Dual-Ratio Sparsification
BRDS: An FPGA-based LSTM Accelerator with Row-Balanced Dual-Ratio Sparsification Open
In this paper, first, a hardware-friendly pruning algorithm for reducing energy consumption and improving the speed of Long Short-Term Memory (LSTM) neural network accelerators is presented. Next, an FPGA-based platform for efficient execu…
View article: Space Expansion of Feature Selection for Designing more Accurate Error Predictors
Space Expansion of Feature Selection for Designing more Accurate Error Predictors Open
Approximate computing is being considered as a promising design paradigm to overcome the energy and performance challenges in computationally demanding applications. If the case where the accuracy can be configured, the quality level versu…
View article: TheSPoT: Thermal Stress-Aware Power and Temperature Management for Multiprocessor Systems-on-Chip
TheSPoT: Thermal Stress-Aware Power and Temperature Management for Multiprocessor Systems-on-Chip Open
Thermal stress including temperature gradients in time and space, as well as thermal cycling, influences lifetime reliability and performance of modern multiprocessor systems-on-chip (MPSoCs). Conventional power and temperature management …