Explanipedia

Instant neural graphics primitives with a multiresolution hash encoding Open

Thomas Müller, Alex Evans, Christoph Schied, Alexander Keller · 2022

Computer science

Neural graphics primitives, parameterized by fully connected neural networks, can be costly to train and evaluate. We reduce this cost with a versatile new input encoding that permits the use of a smaller network without sacrificing qualit…

A Scalable Multicore Architecture With Heterogeneous Memory Structures for Dynamic Neuromorphic Asynchronous Processors (DYNAPs) Open

Saber Moradi, Ning Qiao, Fabio Stefanini, Giacomo Indiveri · 2017

Computer science

Neuromorphic computing systems comprise networks of neurons that use asynchronous events for both computation and communication. This type of representation offers several advantages in terms of bandwidth and power consumption in neuromorp…

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness Open

Tri Dao, Daniel Y. Fu, Stefano Ermon, Atri Rudra, Christopher Ré · 2022

Computer science

Transformers are slow and memory-hungry on long sequences, since the time and memory complexity of self-attention are quadratic in sequence length. Approximate attention methods have attempted to address this problem by trading off model q…

DRISA Open

Shuangchen Li, Dimin Niu, Krishna T. Malladi, Hongzhong Zheng, Bob Brennan , et al. · 2017

Computer science

Data movement between the processing units and the memory in traditional von Neumann architecture is creating the "memory wall" problem. To bridge the gap, two approaches, the memory-rich processor (more on-chip memory) and the compute-cap…

Processing data where it makes sense: Enabling in-memory computation Open

Onur Mutlu, Saugata Ghose, Juan Gómez-Luna, Rachata Ausavarungnirun · 2019

Computer science Biology

Today's systems are overwhelmingly designed to move data to computation. This design choice goes directly against at least three key trends in systems that cause performance, scalability and energy bottlenecks: (1) data access from memory …

In‐Memory Vector‐Matrix Multiplication in Monolithic Complementary Metal–Oxide–Semiconductor‐Memristor Integrated Circuits: Design Choices, Challenges, and Perspectives Open

Amirali Amirsoleimani, Fabien Alibart, Victor Yon, Jianxiong Xu, M. Reza Pazhouhandeh , et al. · 2020

Computer science Engineering Physics

The low communication bandwidth between memory and processing units in conventional von Neumann machines does not support the requirements of emerging applications that rely extensively on large sets of data. More recent computing paradigm…

Near-global climate simulation at 1 km resolution: establishing a performance baseline on 4888 GPUs with COSMO 5.0 Open

Oliver Fuhrer, Tarun Chadha, Torsten Hoefler, Grzegorz Kwaśniewski, Xavier Lapillonne , et al. · 2018

Computer science Mathematics Biology

The best hope for reducing long-standing global climate model biases is by increasing resolution to the kilometer scale. Here we present results from an ultrahigh-resolution non-hydrostatic climate model for a near-global setup running on …

Long Short-Term Memory Recurrent Neural Network for Automatic Speech Recognition Open

Jane Oruh, Serestina Viriri, Adekanmi Adeyinka Adegun · 2022

Computer science Geography

Automatic speech recognition (ASR) is one of the most demanding tasks in natural language processing owing to its complexity. Recently, deep learning approaches have been deployed for this task and have been proven to outperform traditiona…

Over 100x Faster Bootstrapping in Fully Homomorphic Encryption through Memory-centric Optimization with GPUs Open

Wonkyung Jung, Sangpyo Kim, Jung Ho Ahn, Jung Hee Cheon, Younho Lee · 2021

Computer science Economics

Fully Homomorphic encryption (FHE) has been gaining in popularity as an emerging means of enabling an unlimited number of operations in an encrypted message without decryption. A major drawback of FHE is its high computational cost. Specif…

Simultaneous Multi-Layer Access Open

Donghyuk Lee, Saugata Ghose, Gennady Pekhimenko, Samira Khan, Onur Mutlu · 2016

Computer science

3D-stacked DRAM alleviates the limited memory bandwidth bottleneck that exists in modern systems by leveraging through silicon vias (TSVs) to deliver higher external memory channel bandwidth. Today’s systems, however, cannot fully utilize …

An FPGA-Based CNN Accelerator Integrating Depthwise Separable Convolution Open

Bing Liu, Danyin Zou, Lei Feng, Shou Feng, Ping Fu , et al. · 2019

Computer science Mathematics

The Convolutional Neural Network (CNN) has been used in many fields and has achieved remarkable results, such as image classification, face detection, and speech recognition. Compared to GPU (graphics processing unit) and ASIC, a FPGA (fie…

Demystifying CXL Memory with Genuine CXL-Ready Systems and Devices Open

Yan Sun, Yifan Yuan, Zeduo Yu, Reese Kuper, Chihun Song , et al. · 2023

Computer science

This code release includes testings scripts for some figures, our benchmark Memo, and our tuning scheme Caption.

The Mondrian Data Engine Open

Mario Drumond, Alexandros Daglis, Nooshin Mirzadeh, Dmitrii Ustiugov, Javier Picorel , et al. · 2017

Computer science Art Engineering

The increasing demand for extracting value out of ever-growing data poses an ongoing challenge to system designers, a task only made trickier by the end of Dennard scaling. As the performance density of traditional CPU-centric architecture…

A Study of the Fundamental Performance Characteristics of GPUs and CPUs for Database Analytics Open

Anil Shanbhag, Samuel Madden, Xiangyao Yu · 2020

Computer science Geography

There has been significant amount of excitement and recent work on GPU-based database systems. Previous work has claimed that these systems can perform orders of magnitude better than CPU-based database systems on analytical workloads such…

Ultra-Efficient Processing In-Memory for Data Intensive Applications Open

Mohsen Imani, Saransh Gupta, Tajana Rosing · 2017

Computer science Biology Engineering

Recent years have witnessed a rapid growth in the domain of Internet of Things (IoT). This network of billions of devices generates and exchanges huge amount of data. The limited cache capacity and memory bandwidth make transferring and pr…

Dissecting the NVidia Turing T4 GPU via Microbenchmarking Open

Zhe Jia, Marco Maggioni, J. A. Smith, Daniele Paolo Scarpazza · 2019

Computer science

In 2019, the rapid rate at which GPU manufacturers refresh their designs, coupled with their reluctance to disclose microarchitectural details, is still a hurdle for those software designers who want to extract the highest possible perform…

RFVP Open

Amir Yazdanbakhsh, Gennady Pekhimenko, Bradley Thwaites, Hadi Esmaeilzadeh, Onur Mutlu , et al. · 2016

Computer science Engineering

This article aims to tackle two fundamental memory bottlenecks: limited off-chip bandwidth (bandwidth wall) and long access latency (memory wall). To achieve this goal, our approach exploits the inherent error resilience of a wide range of…

Classifying Memory Access Patterns for Prefetching Open

Grant Ayers, Heiner Litz, Christos Kozyrakis, Parthasarathy Ranganathan · 2020

Computer science Philosophy Biology

Prefetching is a well-studied technique for addressing the memory access stall time of contemporary microprocessors. However, despite a large body of related work, the memory access behavior of applications is not well understood, and it r…

FPGA-Based Near-Memory Acceleration of Modern Data-Intensive Applications Open

Gagandeep Singh, Mohammed Alser, Damla Senol Cali, Dionysios Diamantopoulos, Juan Gómez-Luna , et al. · 2021

Computer science

Modern data-intensive applications demand high computation capabilities with strict power constraints. Unfortunately, such applications suffer from a significant waste of both execution cycles and energy in current computing systems due to…

SELD-TCN: Sound Event Localization & Detection via Temporal Convolutional Networks Open

Karim Guirguis, Christoph Schorn, Andre Guntoro, Sherif Abdulatif, Bin Yang · 2020

Computer science Art Physics

The understanding of the surrounding environment plays a critical role in\nautonomous robotic systems, such as self-driving cars. Extensive research has\nbeen carried out concerning visual perception. Yet, to obtain a more complete\npercep…

Demystifying the characteristics of 3D-stacked memories: A case study for Hybrid Memory Cube Open

Ramyad Hadidi, Bahar Asgari, Burhan Ahmad Mudassar, Saibal Mukhopadhyay, Sudhakar Yalamanchili , et al. · 2017

Computer science Engineering Physics

Three-dimensional (3D)-stacking technology, which enables the integration of DRAM and logic dies, offers high bandwidth and low energy consumption. This technology also empowers new memory designs for executing tasks not traditionally asso…

Kleio Open

Thaleia Dimitra Doudali, Sergey Blagodurov, Abhinav Vishnu, Sudhanva Gurumurthi, Ada Gavrilovska · 2019

Computer science Economics

The increasing demand of big data analytics for more main memory capacity in datacenters and exascale computing environments is driving the integration of heterogeneous memory technologies. The new technologies exhibit vastly greater diffe…

APPROX-NoC Open

Rahul Boyapati, Jiayi Huang, Pritam Majumder, Ki Hwan Yum, Eun Jung Kim · 2017

Computer science

The trend of unsustainable power consumption and large memory bandwidth demands in massively parallel multicore systems, with the advent of the big data era, has brought upon the onset of alternate computation paradigms utilizing heterogen…

CAIRO Open

Ramyad Hadidi, Lifeng Nai, Hyojong Kim, Hyesoon Kim · 2017

Computer science

Three-dimensional (3D)-stacking technology and the memory-wall problem have popularized processing-in-memory (PIM) concepts again, which offers the benefits of bandwidth and energy savings by offloading computations to functional units ins…

WRPN: Wide Reduced-Precision Networks Open

Asit Mishra, Eriko Nurvitadhi, Jeffrey Cook, Debbie Marr · 2018

Computer science Mathematics Biology

For computer vision applications, prior works have shown the efficacy of reducing numeric precision of model parameters (network weights) in deep neural networks. Activation maps, however, occupy a large memory footprint during both the tr…

Benchmarking a New Paradigm: An Experimental Analysis of a Real Processing-in-Memory Architecture Open

Juan Gómez-Luna, Izzat El Hajj, Ivan Fernandez, Christina Giannoula, Geraldo F. Oliveira , et al. · 2021

Computer science Art

Many modern workloads, such as neural networks, databases, and graph processing, are fundamentally memory-bound. For such workloads, the data movement between main memory and CPU cores imposes a significant overhead in terms of both latenc…

Breaking High-Resolution CNN Bandwidth Barriers With Enhanced Depth-First Execution Open

Koen Goetschalckx, Marian Verhelst · 2019

Computer science

Convolutional neural networks (CNNs) now also start to reach impressive performance on non-classification image processing tasks, such as denoising, demosaicing, super-resolution, and super slow motion. Consequently, CNNs are increasingly …

UPC++: A High-Performance Communication Framework for Asynchronous Computation Open

John Bachan, Scott B. Baden, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil , et al. · 2019

Computer science Physics Economics

© 2019 IEEE UPC++ is a C++ library that supports high-performance computation via an asynchronous communication framework. This paper describes a new incarnation that differs substantially from its predecessor, and we discuss the reasons f…

LAcc Open

Quan Deng, Youtao Zhang, Minxuan Zhang, Jun Yang · 2019

Computer science

PIM (Processing-in-memory)-based CNN (Convolutional neural network) accelerators leverage the characteristics of basic memory cells to enable simple logic and arithmetic operations so that the bandwidth constraint can be effectively allevi…

Potential of a modern vector supercomputer for practical applications: performance evaluation of SX-ACE Open

Ryusuke Egawa, Kazuhiko Komatsu, Shintaro Momose, Yoko Isobe, Akihiro Musa , et al. · 2017

Computer science Geography

Achieving a high sustained simulation performance is the most important concern in the HPC community. To this end, many kinds of HPC system architectures have been proposed, and the diversity of the HPC systems grows rapidly. Under this ci…

Memory bandwidth ≈ Memory bandwidth