Computer memory
View article: The FlashAttention Paradigm: Re-architecting Transformers for Memory-Optimal Scalability
The FlashAttention Paradigm: Re-architecting Transformers for Memory-Optimal Scalability Open
The Transformer architecture has revolutionized deep learning, particularly in natural language processing and computer vision. However, its core self-attention mechanism suffers from quadratic memory and computational complexity with resp…
View article: Elastic Gradient Checkpointing: Scaling Deep Learning Beyond Conventional Memory Limits
Elastic Gradient Checkpointing: Scaling Deep Learning Beyond Conventional Memory Limits Open
The relentless pursuit of larger and more complex deep learning models has increasingly encountered a fundamental bottleneck: the finite memory capacity of conventional hardware accelerators. As model architectures scale in depth, width, a…
View article: The FlashAttention Paradigm: Re-architecting Transformers for Memory-Optimal Scalability
The FlashAttention Paradigm: Re-architecting Transformers for Memory-Optimal Scalability Open
The Transformer architecture has revolutionized deep learning, particularly in natural language processing and computer vision. However, its core self-attention mechanism suffers from quadratic memory and computational complexity with resp…
View article: FlashAttention: Breaking the Memory Wall for Efficient Self-Attention Scaling
FlashAttention: Breaking the Memory Wall for Efficient Self-Attention Scaling Open
The self-attention mechanism is a cornerstone of the Transformer architecture, driving significant advancements across natural language processing, computer vision, and other domains. However, its quadratic computational complexity and, cr…
View article: Elastic Gradient Checkpointing: Scaling Deep Learning Beyond Conventional Memory Limits
Elastic Gradient Checkpointing: Scaling Deep Learning Beyond Conventional Memory Limits Open
The relentless pursuit of larger and more complex deep learning models has increasingly encountered a fundamental bottleneck: the finite memory capacity of conventional hardware accelerators. As model architectures scale in depth, width, a…
View article: FlashAttention: Breaking the Memory Wall for Efficient Self-Attention Scaling
FlashAttention: Breaking the Memory Wall for Efficient Self-Attention Scaling Open
The self-attention mechanism is a cornerstone of the Transformer architecture, driving significant advancements across natural language processing, computer vision, and other domains. However, its quadratic computational complexity and, cr…
View article: Flash Attention: Unlocking Bandwidth-Optimal Self-Attention for Trillion-Parameter Models
Flash Attention: Unlocking Bandwidth-Optimal Self-Attention for Trillion-Parameter Models Open
The Transformer architecture, with its cornerstone self-attention mechanism, has revolutionized deep learning, particularly in natural language processing. However, as models scale towards trillions of parameters and sequence lengths grow,…
View article: Flash Attention: Unlocking Bandwidth-Optimal Self-Attention for Trillion-Parameter Models
Flash Attention: Unlocking Bandwidth-Optimal Self-Attention for Trillion-Parameter Models Open
The Transformer architecture, with its cornerstone self-attention mechanism, has revolutionized deep learning, particularly in natural language processing. However, as models scale towards trillions of parameters and sequence lengths grow,…
View article: Compute-in-Memory Based on Emerging Non-Volatile Memories: RRAM, MRAM, and FeRAM
Compute-in-Memory Based on Emerging Non-Volatile Memories: RRAM, MRAM, and FeRAM Open
In the era of artificial intelligence, Internet of things and big data, processing massive data puts forward unprecedented requirements for the throughput and energy efficiency of computing systems. In traditional von Neumann architectures…
View article: A memory constrained bayesian optimization via robust online memory estimation
A memory constrained bayesian optimization via robust online memory estimation Open
Bayesian optimization (BO) is a memory-intensive algorithm that requires training and evaluating an expensive objective function. In contrast to previous works that use an offline memory estimation to make BO memory-efficient, we propose a…
View article: CAMformer: Associative Memory is All You Need
CAMformer: Associative Memory is All You Need Open
Transformers face scalability challenges due to the quadratic cost of attention, which involves dense similarity computations between queries and keys. We propose CAMformer, a novel accelerator that reinterprets attention as an associative…
View article: Indexed Parametric Memory (IPM): A New Paradigm for Lossless Lifelong LLM Memory
Indexed Parametric Memory (IPM): A New Paradigm for Lossless Lifelong LLM Memory Open
We introduce Indexed Parametric Memory (IPM), the third memory paradigm for large language models after non-parametric retrieval (RAG) and traditional parametric memory. In IPM: Raw conversation history is stored externally with unique IDs…
View article: Indexed Parametric Memory (IPM): A New Paradigm for Lossless Lifelong LLM Memory
Indexed Parametric Memory (IPM): A New Paradigm for Lossless Lifelong LLM Memory Open
We introduce Indexed Parametric Memory (IPM), the third memory paradigm for large language models after non-parametric retrieval (RAG) and traditional parametric memory. In IPM: Raw conversation history is stored externally with unique IDs…
View article: Indexed Parametric Memory (IPM): A New Paradigm for Lossless Lifelong LLM Memory
Indexed Parametric Memory (IPM): A New Paradigm for Lossless Lifelong LLM Memory Open
We introduce Indexed Parametric Memory (IPM), the third memory paradigm for large language models after non-parametric retrieval (RAG) and traditional parametric memory. In IPM: Raw conversation history is stored externally with unique IDs…
View article: Indexed Parametric Memory (IPM): A New Paradigm for Lossless Lifelong LLM Memory
Indexed Parametric Memory (IPM): A New Paradigm for Lossless Lifelong LLM Memory Open
We introduce Indexed Parametric Memory (IPM), the third memory paradigm for large language models after non-parametric retrieval (RAG) and traditional parametric memory. In IPM: Raw conversation history is stored externally with unique IDs…
View article: mohamedorhan/Electromagnetic-Energy-Memory-EEM-: Electromagnetic Energy Memory (EEM) — Official v1.0.0 Release
mohamedorhan/Electromagnetic-Energy-Memory-EEM-: Electromagnetic Energy Memory (EEM) — Official v1.0.0 Release Open
Electromagnetic Energy Memory (EEM) Official v1.0.0 Scientific Release This release contains the complete implementation and documentation for the Electromagnetic Energy Memory (EEM) model — a resonant, non-chemical energy-storage framewor…
View article: mohamedorhan/Electromagnetic-Energy-Memory-EEM-: Electromagnetic Energy Memory (EEM) — Official v1.0.0 Release
mohamedorhan/Electromagnetic-Energy-Memory-EEM-: Electromagnetic Energy Memory (EEM) — Official v1.0.0 Release Open
Electromagnetic Energy Memory (EEM) Official v1.0.0 Scientific Release This release contains the complete implementation and documentation for the Electromagnetic Energy Memory (EEM) model — a resonant, non-chemical energy-storage framewor…
View article: EONSim: An NPU Simulator for On-Chip Memory and Embedding Vector Operations
EONSim: An NPU Simulator for On-Chip Memory and Embedding Vector Operations Open
Embedding vector operations are a key component of modern deep neural network workloads. Unlike matrix operations with deterministic access patterns, embedding vector operations exhibit input data-dependent and non-deterministic memory acc…
View article: EONSim: An NPU Simulator for On-Chip Memory and Embedding Vector Operations
EONSim: An NPU Simulator for On-Chip Memory and Embedding Vector Operations Open
Embedding vector operations are a key component of modern deep neural network workloads. Unlike matrix operations with deterministic access patterns, embedding vector operations exhibit input data-dependent and non-deterministic memory acc…
View article: IMPLEMENTATION OF LOW POWER MEMRISTOR CONTENT ADDRESSABLE MEMORY USING FINFET
IMPLEMENTATION OF LOW POWER MEMRISTOR CONTENT ADDRESSABLE MEMORY USING FINFET Open
The research environment is promptly looking at the extensive development of memristor devices in industrial applications. Future technology is eagerly waiting for the upcoming developments in memristor-based devices. Memristor regulates t…
View article: Revisiting Memory Hierarchies with CMM-H: Use Device-side Caching to Integrate DRAM and SSD for a Hybrid CXL Memory
Revisiting Memory Hierarchies with CMM-H: Use Device-side Caching to Integrate DRAM and SSD for a Hybrid CXL Memory Open
View article: INFLUENCE OF PROTON RADIATION ON THE DEGRADATION AND FAILURES OF RAM CHIPS
INFLUENCE OF PROTON RADIATION ON THE DEGRADATION AND FAILURES OF RAM CHIPS Open
The influence of proton irradiation on semiconductor random-access memory (RAM) chips in the space environment is examined. The relevance of the topic stems from the fact that cosmic-ray protons are capable of causing both instantaneous ma…
View article: Physical complexity and black hole quantum computers
Physical complexity and black hole quantum computers Open
The theory of computational complexity is based on the tradeoff between two computational resources, memory space and computer time. This paper investigates the physical counterparts of these resources. Memory space is the number of bits o…
View article: Atomically‐Thin Freestanding Racetrack Memory Devices
Atomically‐Thin Freestanding Racetrack Memory Devices Open
Advances in freestanding membranes allow novel heterostructures to be formed from distinct families of materials in 2D or 3D configurations. Recently, this technique has been used to form a 3D racetrack memory device by transferring a comp…
View article: Storage Class Memory is Dead, All Hail Managed-Retention Memory: Rethinking Memory for the AI Era
Storage Class Memory is Dead, All Hail Managed-Retention Memory: Rethinking Memory for the AI Era Open
View article: Design and Implementation of Memory Controller for Byte Access from Data Memory for SoC’s Devices
Design and Implementation of Memory Controller for Byte Access from Data Memory for SoC’s Devices Open
A System-on Modern computing systems, particularly System-on-Chip (SoC) architectures, incorporate multiple processors, integrated memory, and control logic to enhance efficiency. These architectures are prevalent in contemporary electroni…
View article: Self-Refresh Memory in Pixel Circuit With 18-bit Color Depth for Liquid Crystal Displays
Self-Refresh Memory in Pixel Circuit With 18-bit Color Depth for Liquid Crystal Displays Open
View article: A Survey on Computing-in-Memory (CiM) and Emerging Nonvolatile Memory (NVM) Simulators
A Survey on Computing-in-Memory (CiM) and Emerging Nonvolatile Memory (NVM) Simulators Open
Modern computer applications have become highly data-intensive, giving rise to an increase in data traffic between the processor and memory units. Computing-in-Memory (CiM) has shown great promise as a solution to this aptly named von Neum…
View article: 3D NAND flash memory for a Pseudo quantum computer platform
3D NAND flash memory for a Pseudo quantum computer platform Open
View article: Overcoming sensory-memory interference in working memory circuits
Overcoming sensory-memory interference in working memory circuits Open
Memories of recent stimuli are crucial for guiding behavior, but the sensory pathways responsible for encoding these memories are continuously bombarded by new sensory experiences. How the brain overcomes interference between sensory input…