Quantization (signal processing) ≈ Quantization (signal processing)
View article
YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications Open
For years, the YOLO series has been the de facto industry-level standard for efficient object detection. The YOLO community has prospered overwhelmingly to enrich its use in a multitude of hardware platforms and abundant scenarios. In this…
View article
A Survey on Learning to Hash Open
Nearest neighbor search is a problem of finding the data points from the database such that the distances from them to the query point are the smallest. Learning to hash is one of the major solutions to this problem and has been widely stu…
View article
A Survey of Quantization Methods for Efficient Neural Network Inference Open
As soon as abstract mathematical computations were adapted to computation on digital computers, the problem of efficient representation, manipulation, and communication of the numerical values in those computations arose.Strongly related t…
View article
QSGD: Communication-Efficient SGD via Gradient Quantization and Encoding Open
Parallel implementations of stochastic gradient descent (SGD) have received significant research attention, thanks to excellent scalability properties of this algorithm, and to its efficiency in the context of training deep neural networks…
View article
Large Intelligent Surface-Assisted Wireless Communication Exploiting Statistical CSI Open
Large intelligent surface (LIS)-assisted wireless communications have drawn attention worldwide. With the use of low-cost LIS on building walls, signals can be reflected by the LIS and sent out along desired directions by controlling its p…
View article
FastText.zip: Compressing text classification models Open
We consider the problem of producing compact architectures for text classification, such that the full model fits in a limited amount of memory. After considering different solutions inspired by the hashing literature, we propose a method …
View article
DeepTrust^RT: Confidential Deep Neural Inference Meets Real-Time! Open
Deep Neural Networks (DNNs) are becoming common in "learning-enabled" time-critical applications such as autonomous driving and robotics. One approach to protect DNN inference from adversarial actions and preserve model privacy/confidentia…
View article
Quantizing deep convolutional networks for efficient inference: A whitepaper Open
We present an overview of techniques for quantizing convolutional neural networks for inference with integer weights and activations. Per-channel quantization of weights and per-layer quantization of activations to 8-bits of precision post…
View article
Trained Ternary Quantization Open
Deep neural networks are widely used in machine learning applications. However, the deployment of large neural networks models can be difficult to deploy on mobile devices with limited power budgets. To solve this problem, we propose Train…
View article
Deep Hashing Network for Efficient Similarity Retrieval Open
Due to the storage and retrieval efficiency, hashing has been widely deployed to approximate nearest neighbor search for large-scale multimedia retrieval. Supervised hashing, which improves the quality of hash coding by exploiting the sema…
View article
Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence Open
Product Quantization (PQ) has long been a mainstream for generating an\nexponentially large codebook at very low memory/time cost. Despite its success,\nPQ is still tricky for the decomposition of high-dimensional vector space, and\nthe re…
View article
Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights Open
This paper presents incremental network quantization (INQ), a novel method, targeting to efficiently convert any pre-trained full-precision convolutional neural network (CNN) model into a low-precision version whose weights are constrained…
View article
QLoRA: Efficient Finetuning of Quantized LLMs Open
We present QLoRA, an efficient finetuning approach that reduces memory usage enough to finetune a 65B parameter model on a single 48GB GPU while preserving full 16-bit finetuning task performance. QLoRA backpropagates gradients through a f…
View article
Throughput Analysis of Massive MIMO Uplink With Low-Resolution ADCs Open
We investigate the uplink throughput achievable by a multiple-user (MU) massive multiple-input multiple-output (MIMO) system, in which the base station is equipped with a large number of low-resolution analog-to-digital converters (ADCs). …
View article
Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT Open
Transformer based architectures have become de-facto models used for a range of Natural Language Processing tasks. In particular, the BERT based models achieved significant accuracy gain for GLUE tasks, CoNLL-03 and SQuAD. However, BERT ba…
View article
Reduced Reference Perceptual Quality Model With Application to Rate Control for Video-Based Point Cloud Compression Open
In rate-distortion optimization, the encoder settings are determined by maximizing a reconstruction quality measure subject to a constraint on the bitrate. One of the main challenges of this approach is to define a quality measure that can…
View article
One-Bit Over-the-Air Aggregation for Communication-Efficient Federated Edge Learning: Design and Convergence Analysis Open
Federated edge learning (FEEL) is a popular framework for model training at an edge server using data distributed at edge devices (e.g., smart-phones and sensors) without compromising their privacy. In the FEEL framework, edge devices peri…
View article
Learned Step Size Quantization Open
Deep networks run with low precision operations at inference time offer power and space advantages over high precision alternatives, but need to overcome the challenge of maintaining high accuracy as precision decreases. Here, we present a…
View article
Introduction to quantum electromagnetic circuits Open
Summary The article is a short opinionated review of the quantum treatment of electromagnetic circuits, with no pretension to exhaustiveness. This review, which is an updated and modernized version of a previous set of Les Houches School l…
View article
Significantly Improving Lossy Compression for Scientific Data Sets Based on Multidimensional Prediction and Error-Controlled Quantization Open
Today's HPC applications are producing extremely large amounts of data, such that data storage and analysis are becoming more challenging for scientific research. In this work, we design a new error-controlled lossy compression algorithm f…
View article
Redefining near-unity luminescence in quantum dots with photothermal threshold quantum yield Open
Superefficient light emission A challenge to improving synthesis methods for superefficient light-emitting semiconductor nanoparticles is that current analytical methods cannot measure efficiencies above 99%. Hanifi et al. used phototherma…
View article
Model compression via distillation and quantization Open
Deep neural networks (DNNs) continue to make significant advances, solving tasks from image classification to translation or reinforcement learning. One aspect of the field receiving considerable attention is efficiently executing deep mod…
View article
Soft-to-Hard Vector Quantization for End-to-End Learning Compressible Representations Open
We present a new approach to learn compressible representations in deep architectures with an end-to-end training strategy. Our method is based on a soft (continuous) relaxation of quantization and entropy, which we anneal to their discret…
View article
On the Spectral Efficiency of Massive MIMO Systems With Low-Resolution ADCs Open
The low-resolution analog-to-digital convertor (ADC) is a promising solution\nto significantly reduce the power consumption of radio frequency circuits in\nmassive multiple-input multiple-output (MIMO) systems. In this letter, we\ninvestig…
View article
FedPAQ: A Communication-Efficient Federated Learning Method with Periodic Averaging and Quantization Open
Federated learning is a distributed framework according to which a model is trained over a set of devices, while keeping data localized. This framework faces several systems-oriented challenges which include (i) communication bottleneck si…
View article
Deep Metric Learning to Rank Open
We propose a novel deep metric learning method by revisiting the learning to rank approach. Our method, named FastAP, optimizes the rank-based Average Precision measure, using an approximation derived from distance quantization. FastAP has…
View article
Single Path One-Shot Neural Architecture Search with Uniform Sampling Open
We revisit the one-shot Neural Architecture Search (NAS) paradigm and analyze its advantages over existing NAS approaches. Existing one-shot method, however, is hard to train and not yet effective on large scale datasets like ImageNet. Thi…
View article
UVeQFed: Universal Vector Quantization for Federated Learning Open
Traditional deep learning models are trained at a centralized server using\nlabeled data samples collected from end devices or users. Such data samples\noften include private information, which the users may not be willing to share.\nFeder…
View article
Extremely Low Bit Neural Network: Squeeze the Last Bit Out With ADMM Open
Although deep learning models are highly effective for various learning tasks, their high computational costs prohibit the deployment to scenarios where either memory or computational resources are limited. In this paper, we focus on compr…
View article
Secure and Robust Fragile Watermarking Scheme for Medical Images Open
Over the past decade advances in computer-based communication and health services, the need for image security becomes urgent to address the requirements of both safety and non-safety in medical applications. This paper proposes a new frag…