Puneet Gupta
YOU?
Author Swipe
View article: YAP+: Pad-Layout-Aware Yield Modeling and Simulation for Hybrid Bonding
YAP+: Pad-Layout-Aware Yield Modeling and Simulation for Hybrid Bonding Open
View article: Near-energy-free photonic Fourier transformation for convolution operation acceleration
Near-energy-free photonic Fourier transformation for convolution operation acceleration Open
View article: ChipletPart: Cost-Aware Partitioning for 2.5D Systems
ChipletPart: Cost-Aware Partitioning for 2.5D Systems Open
Industry adoption of chiplets has been increasing as a cost-effective option for making larger high-performance systems. Consequently, partitioning large systems into chiplets is increasingly important. In this work, we introduce ChipletPa…
View article: FRED: A Wafer-scale Fabric for 3D Parallel DNN Training
FRED: A Wafer-scale Fabric for 3D Parallel DNN Training Open
View article: Optimizing Base Layer Design Rule Checks in Chip Physical Design
Optimizing Base Layer Design Rule Checks in Chip Physical Design Open
This article presents a comprehensive analysis of abstract modeling approaches for base layer design rule checks in advanced semiconductor design. As semiconductor technology continues to advance toward smaller nodes, the complexity of bas…
View article: CATCH: a Cost Analysis Tool for Co-optimization of chiplet-based Heterogeneous systems
CATCH: a Cost Analysis Tool for Co-optimization of chiplet-based Heterogeneous systems Open
With the increasing prevalence of chiplet systems in high-performance computing applications, the number of design options has increased dramatically. Instead of chips defaulting to a single die design, now there are options for 2.5D and 3…
View article: Machine Learning-Enhanced Greedy Algorithm for Optimizing Hold Time Violations in Advanced Node SoC Designs
Machine Learning-Enhanced Greedy Algorithm for Optimizing Hold Time Violations in Advanced Node SoC Designs Open
This article presents an innovative approach to resolving hold time violations in advanced technology nodes using a greedy algorithm methodology. The article addresses critical challenges in modern System-on-Chip (SoC) designs, particularl…
View article: Experimental validation of a novel characterization procedure based on fast sweep measurements for linear resonators with a large time constant
Experimental validation of a novel characterization procedure based on fast sweep measurements for linear resonators with a large time constant Open
International audience
View article: FRED: Flexible REduction-Distribution Interconnect and Communication Implementation for Wafer-Scale Distributed Training of DNN Models
FRED: Flexible REduction-Distribution Interconnect and Communication Implementation for Wafer-Scale Distributed Training of DNN Models Open
Distributed Deep Neural Network (DNN) training is a technique to reduce the training overhead by distributing the training tasks into multiple accelerators, according to a parallelization strategy. However, high-performance compute and int…
View article: Smoothing Disruption Across the Stack: Tales of Memory, Heterogeneity, & Compilers
Smoothing Disruption Across the Stack: Tales of Memory, Heterogeneity, & Compilers Open
International audience
View article: Experimental Validation of a Novel Characterization Technique for Linear Resonators with a Large Time Constant
Experimental Validation of a Novel Characterization Technique for Linear Resonators with a Large Time Constant Open
View article: DeepFlow: A Cross-Stack Pathfinding Framework for Distributed AI Systems
DeepFlow: A Cross-Stack Pathfinding Framework for Distributed AI Systems Open
Over the past decade, machine learning model complexity has grown at an extraordinary rate, as has the scale of the systems training such large models. However, there is an alarmingly low hardware utilization (5–20%) in large scale AI syst…
View article: ReFOCUS: Reusing Light for Efficient Fourier Optics-Based Photonic Neural Network Accelerator
ReFOCUS: Reusing Light for Efficient Fourier Optics-Based Photonic Neural Network Accelerator Open
In recent years, there has been a significant focus on achieving low-latency and high-throughput convolutional neural network (CNN) inference. Integrated photonics offers the potential to substantially expedite neural networks due to its i…
View article: Cost-Driven Hardware-Software Co-Optimization of Machine Learning Pipelines
Cost-Driven Hardware-Software Co-Optimization of Machine Learning Pipelines Open
Researchers have long touted a vision of the future enabled by a proliferation of internet-of-things devices, including smart sensors, homes, and cities. Increasingly, embedding intelligence in such devices involves the use of deep neural …
View article: End-to-end differentiability and tensor processing unit computing to accelerate materials’ inverse design
End-to-end differentiability and tensor processing unit computing to accelerate materials’ inverse design Open
Numerical simulations have revolutionized material design. However, although simulations excel at mapping an input material to its output property, their direct application to inverse design has traditionally been limited by their high com…
View article: Training Neural Networks for Execution on Approximate Hardware
Training Neural Networks for Execution on Approximate Hardware Open
Approximate computing methods have shown great potential for deep learning. Due to the reduced hardware costs, these methods are especially suitable for inference tasks on battery-operated devices that are constrained by their power budget…
View article: A Nonvolatile Compute-in-Memory Macro Using Voltage-Controlled MRAM and In Situ Magnetic-to-Digital Converter
A Nonvolatile Compute-in-Memory Macro Using Voltage-Controlled MRAM and In Situ Magnetic-to-Digital Converter Open
Compute-in-memory (CIM) accelerator has become a popular solution to achieve high energy efficiency for deep learning applications in edge devices. Recent works have demonstrated CIM macros using nonvolatile memories [spin transfer torque …
View article: PhotoFourier: A Photonic Joint Transform Correlator-Based Neural Network Accelerator
PhotoFourier: A Photonic Joint Transform Correlator-Based Neural Network Accelerator Open
The last few years have seen a lot of work to address the challenge of low-latency and high-throughput convolutional neural network inference. Integrated photonics has the potential to dramatically accelerate neural networks because of its…
View article: DeepFlow: A Cross-Stack Pathfinding Framework for Distributed AI Systems
DeepFlow: A Cross-Stack Pathfinding Framework for Distributed AI Systems Open
Over the past decade, machine learning model complexity has grown at an extraordinary rate, as has the scale of the systems training such large models. However there is an alarmingly low hardware utilization (5-20%) in large scale AI syste…
View article: High‐Throughput Multichannel Parallelized Diffraction Convolutional Neural Network Accelerator
High‐Throughput Multichannel Parallelized Diffraction Convolutional Neural Network Accelerator Open
Convolutional neural networks are paramount in image and signal processing, and are responsible for the majority of image recognition power consumption today, concentrated mainly in convolution computations. With convolution operations bei…
View article: Bit-serial Weight Pools: Compression and Arbitrary Precision Execution of Neural Networks on Resource Constrained Processors
Bit-serial Weight Pools: Compression and Arbitrary Precision Execution of Neural Networks on Resource Constrained Processors Open
Applications of neural networks on edge systems have proliferated in recent years but the ever-increasing model size makes neural networks not able to deploy on resource-constrained microcontrollers efficiently. We propose bit-serial weigh…
View article: High Throughput Multi-Channel Parallelized Diffraction Convolutional Neural Network Accelerator
High Throughput Multi-Channel Parallelized Diffraction Convolutional Neural Network Accelerator Open
Convolutional neural networks are paramount in image and signal processing including the relevant classification and training tasks alike and constitute for the majority of machine learning compute demand today. With convolution operations…
View article: Lightweight Software-Defined Error Correction for Memories
Lightweight Software-Defined Error Correction for Memories Open
Reliability of the memory subsystem is a growing concern in computer architecture and system design. From on-chip embedded memories in Internet-of-Things (IoT) devices and on-chip caches to off-chip main memories, the memory subsystems hav…
View article: Massively parallel amplitude-only Fourier neural network
Massively parallel amplitude-only Fourier neural network Open
Machine intelligence has become a driving factor in modern society. However, its demand outpaces the underlying electronic technology due to limitations given by fundamental physics, such as capacitive charging of wires, but also by system…
View article: Channel Tiling for Improved Performance and Accuracy of Optical Neural Network Accelerators
Channel Tiling for Improved Performance and Accuracy of Optical Neural Network Accelerators Open
Low latency, high throughput inference on Convolution Neural Networks (CNNs) remains a challenge, especially for applications requiring large input or large kernel sizes. 4F optics provides a solution to accelerate CNNs by converting convo…
View article: Pathfinding for 2.5D interconnect technologies
Pathfinding for 2.5D interconnect technologies Open
As conventional technology scaling becomes harder, 2.5D integration provides a viable pathway to building larger systems at lower cost. Therefore recently, there has been a proliferation of multiple 2.5D integration technologies that offer…
View article: Perceived sources of stress amongst Indian dental students in Bareilly city
Perceived sources of stress amongst Indian dental students in Bareilly city Open
Introduction: In addition to the stresses pertaining to dentistry as a profession, dental students have to face the additional stress of their studies. Through stress can also contribute to decreased student performance. The aim of this st…
View article: Smart Hoover with Mower
Smart Hoover with Mower Open
This paper presents the advancement in the design and development of a vacuum cleaner with lawn mower. This paper focuses on the developing a handy automated vacuum cleaner with lawn mower robot which operates on Arduino programming and ca…
View article: Implant Surface Microtopography – A Review
Implant Surface Microtopography – A Review Open
Osseointegration is the direct contact between the living bone and the implant surface without interposed soft tissue at the microscopic level and it is a critical process for implant stability and consequent short-and long-term clinical s…
View article: MOMBAT: Heart Rate Monitoring from Face Video using Pulse Modeling and\n Bayesian Tracking
MOMBAT: Heart Rate Monitoring from Face Video using Pulse Modeling and\n Bayesian Tracking Open
A non-invasive yet inexpensive method for heart rate (HR) monitoring is of\ngreat importance in many real-world applications including healthcare,\npsychology understanding, affective computing and biometrics. Face videos are\ncurrently ut…