Tei‐Wei Kuo
YOU?
Author Swipe
View article: Practicalizing Tree-Based Model Acceleration with CAM through Model Pruning and Data Placement Optimization
Practicalizing Tree-Based Model Acceleration with CAM through Model Pruning and Data Placement Optimization Open
International audience
View article: ReCross: Efficient Embedding Reduction Scheme for In-Memory Computing using ReRAM-Based Crossbar
ReCross: Efficient Embedding Reduction Scheme for In-Memory Computing using ReRAM-Based Crossbar Open
Deep learning-based recommendation models (DLRMs) are widely deployed in commercial applications to enhance user experience. However, the large and sparse embedding layers in these models impose substantial memory bandwidth bottlenecks due…
View article: Retrieval-Augmented Generation for Natural Language Processing: A Survey
Retrieval-Augmented Generation for Natural Language Processing: A Survey Open
Large language models (LLMs) have demonstrated great success in various fields, benefiting from their huge amount of parameters that store knowledge.However, LLMs still suffer from several key issues, such as hallucination problems, knowle…
View article: RETENTION: Resource-Efficient Tree-Based Ensemble Model Acceleration with Content-Addressable Memory
RETENTION: Resource-Efficient Tree-Based Ensemble Model Acceleration with Content-Addressable Memory Open
Although deep learning has demonstrated remarkable capabilities in learning from unstructured data, modern tree-based ensemble models remain superior in extracting relevant information and learning from structured datasets. While several e…
View article: Easz: An Agile Transformer-based Image Compression Framework for Resource-constrained IoTs
Easz: An Agile Transformer-based Image Compression Framework for Resource-constrained IoTs Open
Neural image compression, necessary in various machine-to-machine communication scenarios, suffers from its heavy encode-decode structures and inflexibility in switching between different compression levels. Consequently, it raises signifi…
View article: Search-in-Memory (SiM): Reliable, Versatile, and Efficient Data Matching in SSD's NAND Flash Memory Chip for Data Indexing Acceleration
Search-in-Memory (SiM): Reliable, Versatile, and Efficient Data Matching in SSD's NAND Flash Memory Chip for Data Indexing Acceleration Open
To index the increasing volume of data, modern data indexes are typically stored on SSDs and cached in DRAM. However, searching such an index has resulted in significant I/O traffic due to limited access locality and inefficient cache util…
View article: Retrieval-Augmented Generation for Natural Language Processing: A Survey
Retrieval-Augmented Generation for Natural Language Processing: A Survey Open
Large language models (LLMs) have demonstrated great success in various fields, benefiting from their huge amount of parameters that store knowledge. However, LLMs still suffer from several key issues, such as hallucination problems, knowl…
View article: RAEE: A Robust Retrieval-Augmented Early Exiting Framework for Efficient Inference
RAEE: A Robust Retrieval-Augmented Early Exiting Framework for Efficient Inference Open
Deploying large language model inference remains challenging due to their high computational overhead. Early exiting optimizes model inference by adaptively reducing the number of inference layers. Existing methods typically train internal…
View article: ReFusion: Improving Natural Language Understanding with Computation-Efficient Retrieval Representation Fusion
ReFusion: Improving Natural Language Understanding with Computation-Efficient Retrieval Representation Fusion Open
Retrieval-based augmentations (RA) incorporating knowledge from an external database into language models have greatly succeeded in various knowledge-intensive (KI) tasks. However, integrating retrievals in non-knowledge-intensive (NKI) ta…
View article: Pipette: Efficient Fine-Grained Reads for SSDs
Pipette: Efficient Fine-Grained Reads for SSDs Open
Big data applications, such as recommendation system and social network, often generate a huge number of fine-grained reads to the storage. Block-oriented storage devices upon the traditional storage system rely on the paging mechanism to …
View article: BiTrackGAN: Cascaded CycleGANs to Constraint Face Aging
BiTrackGAN: Cascaded CycleGANs to Constraint Face Aging Open
With the increased accuracy of modern computer vision technology, many access control systems are equipped with face recognition functions for faster identification. In order to maintain high recognition accuracy, it is necessary to keep t…
View article: Variational Nested Dropout
Variational Nested Dropout Open
Nested dropout is a variant of dropout operation that is able to order network parameters or features based on the pre-defined importance during training. It has been explored for: I. Constructing nested nets Cui et al. 2020, Cui et al. 20…
View article: Bits-Ensemble: Toward Light-Weight Robust Deep Ensemble by Bits-Sharing
Bits-Ensemble: Toward Light-Weight Robust Deep Ensemble by Bits-Sharing Open
Robustness and uncertainty estimation is crucial to the safety of deep neural networks (DNNs) deployed on the edge. The deep ensemble model, composed of a set of individual DNNs (namely members), has strong performance in accuracy, uncerta…
View article: Message from the General and Program Chairs
Message from the General and Program Chairs Open
The 11th IEEE Non-Volatile Memory Systems and Application Symposium (NVMSA) is a premier conference for new ideas and research results in the area of non-volatile memory systems and emerging memory technologies.This year, NVMSA was held hy…
View article: Pipette
Pipette Open
Big data applications, such as recommendation system and social network, often generate a huge number of fine-grained reads to the storage. Block-oriented storage devices tend to suffer from these fine-grained read operations in terms of I…
View article: NFL: Robust Learned Index via Distribution Transformation
NFL: Robust Learned Index via Distribution Transformation Open
Recent works on learned index open a new direction for the indexing field. The key insight of the learned index is to approximate the mapping between keys and positions with piece-wise linear functions. Such methods require partitioning ke…
View article: RM-SSD: In-Storage Computing for Large-Scale Recommendation Inference
RM-SSD: In-Storage Computing for Large-Scale Recommendation Inference Open
To meet the strict service level agreement requirements of recommendation systems, the entire set of embeddings in recommendation systems needs to be loaded into the memory. However, as the model and dataset for production-scale recommenda…
View article: A Fast Transformer-based General-Purpose Lossless Compressor
A Fast Transformer-based General-Purpose Lossless Compressor Open
Deep-learning-based compressor has received interests recently due to much improved compression ratio. However, modern approaches suffer from long execution time. To ease this problem, this paper targets on cutting down the execution time …
View article: SEOFP-NET: Compression and Acceleration of Deep Neural Networks for Speech Enhancement Using Sign-Exponent-Only Floating-Points
SEOFP-NET: Compression and Acceleration of Deep Neural Networks for Speech Enhancement Using Sign-Exponent-Only Floating-Points Open
Numerous compression and acceleration strategies have achieved outstanding results on classification tasks in various fields. Nevertheless, the same strategies may yield unsatisfactory performance on regression tasks because the nature bet…
View article: SEOFP-NET: Compression and Acceleration of Deep Neural Networks for Speech Enhancement Using Sign-Exponent-Only Floating-Points
SEOFP-NET: Compression and Acceleration of Deep Neural Networks for Speech Enhancement Using Sign-Exponent-Only Floating-Points Open
Numerous compression and acceleration strategies have achieved outstanding results on classification tasks in various fields, such as computer vision and speech signal processing. Nevertheless, the same strategies have yielded ungratified …
View article: Intermittent Speech Recovery.
Intermittent Speech Recovery. Open
A large number of Internet of Things (IoT) devices today are powered by batteries, which are often expensive to maintain and may cause serious environmental pollution. To avoid these problems, researchers have begun to consider the use of …
View article: Speech Recovery for Real-World Self-powered Intermittent Devices
Speech Recovery for Real-World Self-powered Intermittent Devices Open
The incompleteness of speech inputs severely degrades the performance of all the related speech signal processing applications. Although many researches have been proposed to address this issue, they controlled the data missing conditions …
View article: PASSLEAF: A Pool-bAsed Semi-Supervised LEArning Framework for Uncertain Knowledge Graph Embedding
PASSLEAF: A Pool-bAsed Semi-Supervised LEArning Framework for Uncertain Knowledge Graph Embedding Open
In this paper, we study the problem of embedding uncertain knowledge graphs, where each relation between entities is associated with a confidence score. Observing the existing embedding methods may discard the uncertainty information, only…
View article: Variational Nested Dropout
Variational Nested Dropout Open
Nested dropout is a variant of dropout operation that is able to order network parameters or features based on the pre-defined importance during training. It has been explored for: I. Constructing nested nets: the nested nets are neural ne…
View article: Fully Nested Neural Network for Adaptive Compression and Quantization
Fully Nested Neural Network for Adaptive Compression and Quantization Open
Neural network compression and quantization are important tasks for fitting state-of-the-art models into the computational, memory and power constraints of mobile devices and embedded hardware. Recent approaches to model compression/quanti…
View article: Spatiotemporal Super-Resolution with Cross-Task Consistency and Its Semi-supervised Extension
Spatiotemporal Super-Resolution with Cross-Task Consistency and Its Semi-supervised Extension Open
Spatiotemporal super-resolution (SR) aims to upscale both the spatial and temporal dimensions of input videos, and produces videos with higher frame resolutions and rates. It involves two essential sub-tasks: spatial SR and temporal SR. We…