Explanipedia

Efficient Memory Management for Large Language Model Serving with PagedAttention Open

Woosuk Kwon, Z. Li, Siyuan Zhuang, Ying Sheng, L Zheng , et al. · 2023

Computer science

High throughput serving of large language models (LLMs) requires batching sufficiently many requests at a time. However, existing systems struggle because the key-value cache (KV cache) memory for each request is huge and grows and shrinks…

Nimble Page Management for Tiered Memory Systems Open

Zi Yan, Daniel Lustig, David Nellans, Abhishek Bhattacharjee · 2019

Computer science

Software-controlled heterogeneous memory systems have the potential to increase the performance and cost efficiency of computing systems. However they can only deliver on this promise if supported by efficient page management policies and …

VAULT Open

Meysam Taassori, Ali Shafiee, Rajeev Balasubramonian · 2018

Computer science Mathematics

Intel's SGX offers state-of-the-art security features, including confidentiality, integrity, and authentication (CIA) when accessing sensitive pages in memory. Sensitive pages are placed in an Enclave Page Cache (EPC) within the physical m…

A Framework for Memory Oversubscription Management in Graphics Processing Units Open

Chen Li, Rachata Ausavarungnirun, Christopher J. Rossbach, Youtao Zhang, Onur Mutlu , et al. · 2019

Computer science

Modern discrete GPUs support unified memory and demand paging. Automatic management of data movement between CPU memory and GPU memory dramatically reduces developer effort. However, when application working sets exceed physical memory cap…

MEMTIS: Efficient Memory Tiering with Dynamic Page Classification and Page Size Determination Open

Taehyung Lee, Sumit Kumar Monga, Changwoo Min, Young Ik Eom · 2023

Computer science

The evergrowing memory demand fueled by datacenter workloads is the driving force behind new memory technology innovations (e.g., NVM, CXL). Tiered memory is a promising solution which harnesses such multiple memory types with varying capa…

Static Memory Deduplication for Performance Optimization in Cloud Computing Open

Gangyong Jia, Guangjie Han, Hao Wang, Xuan Yang · 2017

Computer science

In a cloud computing environment, the number of virtual machines (VMs) on a single physical server and the number of applications running on each VM are continuously growing. This has led to an enormous increase in the demand of memory cap…

FaaSnap Open

Lixiang Ao, George Porter, Geoffrey M. Voelker · 2022

Computer science

FaaSnap is a VM snapshot-based platform that uses a set of complementary optimizations to improve function cold-start performance for Function-as-a-Service (FaaS) applications. Compact loading set files take better advantage of prefetching…

Efficient Memory Management for Large Language Model Serving with PagedAttention Open

Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng , et al. · 2023

Computer science

High throughput serving of large language models (LLMs) requires batching sufficiently many requests at a time. However, existing systems struggle because the key-value cache (KV cache) memory for each request is huge and grows and shrinks…

Windows Memory Forensics: Detecting (Un)Intentionally Hidden Injected Code by Examining Page Table Entries Open

Frank E. Block, Andreas Dewald · 2019

Computer science

Malware utilizes code injection techniques to either manipulate other processes (e.g. done by banking trojans) or hide its existence. With some exceptions, such as ROP gadgets, the injected code needs to be executable by the CPU (at least …

Perforated Page: Supporting Fragmented Memory Allocation for Large Pages Open

Chang Hyun Park, Sang-Hoon Cha, Bo-Kyeong Kim, Youngjin Kwon, David Black-Schaffer , et al. · 2020

Computer science

The availability of large pages has dramatically improved the efficiency of address translation for applications that use large contiguous regions of memory. However, large pages can be difficult to allocate due to fragmented memory, non-m…

Deconstructing the Energy Consumption of the Mobile Page Load Open

Yi Cao, Javad Nejati, Muhammad Wajahat, Aruna Balasubramanian, Anshul Gandhi · 2017

Computer science Engineering

Mobile Web page performance is critical to content providers, service providers, and users, as Web browsers are one of the most popular apps on phones. Slow Web pages are known to adversely affect profits and lead to user abandonment. Whil…

Secure Page Fusion with VUsion Open

Marco Oliverio, Kaveh Razavi, Herbert Bos, Cristiano Giuffrida · 2017

Computer science

To reduce memory pressure, modern operating systems and hypervisors such as Linux/KVM deploy page-level memory fusion to merge physical memory pages with the same content (i.e., page fusion). A write to a fused memory page triggers a copy-…

On-demand-fork Open

Kaiyang Zhao, Sishuai Gong, Pedro Fonseca · 2021

Computer science

Fork has long been the process creation system call for Unix. At its inception, fork was hailed as an efficient system call due to its use of copy-on-write on memory shared between parent and child processes. However, application memory de…

Adaptive Page Migration Policy With Huge Pages in Tiered Memory Systems Open

Taekyung Heo, Yang Wang, Wei Cui, Jaehyuk Huh, Lintao Zhang · 2020

Computer science Biology Economics

To accommodate the growing demand for memory capacity in a cost-effective way, multiple types of memory are incorporated in a single system. In such tiered memory systems consisting of small fast and large slow memory components, accuratel…

VAULT Open

Meysam Taassori, Ali Shafiee, Rajeev Balasubramonian · 2018

Computer science Mathematics Political science

Intel's SGX offers state-of-the-art security features, including confidentiality, integrity, and authentication (CIA) when accessing sensitive pages in memory. Sensitive pages are placed in an Enclave Page Cache (EPC) within the physical m…

TPP: Transparent Page Placement for CXL-Enabled Tiered-Memory Open

Hasan Al Maruf, Hao Wang, Abhishek Dhanotia, Johannes Weiner, Niket Agarwal , et al. · 2022

Computer science

The increasing demand for memory in hyperscale applications has led to memory becoming a large portion of the overall datacenter spend. The emergence of coherent interfaces like CXL enables main memory expansion and offers an efficient sol…

Page Size Aware Cache Prefetching Open

Georgios Vavouliotis, Gino Chacon, Lluc Alvarez, Paul V. Gratz, Daniel A. Jiménez , et al. · 2022

Computer science

The increase in working set sizes of contemporary applications outpaces the growth in cache sizes, resulting in frequent main memory accesses that deteriorate system per- formance due to the disparity between processor and memory speeds. P…

WIRD: An Efficiency Migration Scheme in Hybrid DRAM and PCM Main Memory for Image Processing Applications Open

Na Niu, Fangfa Fu, Bing Yang, Jiacai Yuan, Fengchang Lai , et al. · 2019

Computer science Chemistry Economics

Using a hybrid main memory in embedded systems to process image processing applications has become an irresistible trend. However, the performance deficiencies (less write endurance and relative longer write latency) in phase change memory…

Reducing Minor Page Fault Overheads through Enhanced Page Walker Open

Chandrahas Tirumalasetty, Chih Chieh Chou, Narasimha Reddy, Paul V. Gratz, Ayman Abouelwafa · 2022

Computer science

Application virtual memory footprints are growing rapidly in all systems from servers down to smartphones. To address this growing demand, system integrators are incorporating ever larger amounts of main memory, warranting rethinking of me…

InvisiPage Open

Shaizeen Aga, Satish Narayanasamy · 2019

Computer science History Mathematics

State-of-art secure processors like Intel SGX remain susceptible to leaking page-level address trace of an application via the page fault channel in which a malicious OS induces spurious page faults and deduces application's secrets from i…

Tight Bounds for Parallel Paging and Green Paging Open

Kunal Agrawal, Michael A. Bender, Rathish Das, William Kuszmaul, Enoch Peserico , et al. · 2021

Computer science Mathematics

In the parallel paging problem, there are p processors that share a cache of size k. The goal is to partition the cache among the processors over time in order to minimize their average completion time. For this long-standing open problem,…

DPW-LRU: An Efficient Buffer Management Policy Based on Dynamic Page Weight for Flash Memory in Cyber-Physical Systems Open

Youwei Yuan, Jintao Zhang, Guangjie Han, Gangyong Jia, Lamei Yan , et al. · 2019

Computer science Philosophy

Owing to its high performance, small size, and low energy consumption, NAND flash memory has been extensively adopted in cyber-physical systems. However, the inherent characteristics of flash memory, including not-in-place update and asymm…

Contiguity Representation in Page Table for Memory Management Units Open

Jae Young Hur · 2018

Computer science Mathematics

Conventional page-based memory management schemes have certain overheads related to system performance and memory utilization mainly due to page table walks. In addition, conventional translation look-aside buffers (TLBs) often suffer from…

Revisiting Swapping in User-Space With Lightweight Threading Open

Kan Zhong, Wenlin Cui, Xin Chen, Qiao Li, Zhe Yang , et al. · 2023

Computer science Mathematics

Memory-intensive applications, such as in-memory databases, caching systems and key-value stores, are increasingly demanding larger main memory to fit their working sets. Conventional swapping can enlarge the memory capacity by paging out …

Fine-grain Quantitative Analysis of Demand Paging in Unified Virtual Memory Open

Tyler Allen, Bennett Cooper, Rong Ge · 2023

Computer science

The abstraction of a shared memory space over separate CPU and GPU memory domains has eased the burden of portability for many HPC codebases. However, users pay for ease of use provided by system-managed memory with a moderate-to-high perf…

Flexible Page-level Memory Access Monitoring Based on Virtualization Hardware Open

Kai Lü, Wenzhe Zhang, Xiaoping Wang, Mikel Luján, Andy Nisbet · 2017

Computer science

Page protection is often used to achieve memory access monitoring in many applications, dealing with program-analysis, checkpoint-based failure recovery, and garbage collection in managed runtime systems. Typically, low overhead access mon…

A Novel Longest Distance First Page Replacement Algorithm Open

Gyanendra Kumar, Parul Tomar · 2017

Computer science

Objectives: To improve the performance of computer in program execution by employing Longest Distance First page replacement algorithm in memory management. Method: There are many traditional page replacement algorithms used in virtual mem…

Smart scene management for IoT-based constrained devices using checkpointing Open

François Aïssaoui, Gene Cooperman, Thierry Monteil, Saïd Tazi · 2016

Computer science

International audience

Online Parallel Paging with Optimal Makespan Open

Kunal Agrawal, Michael A. Bender, Rathish Das, William Kuszmaul, Enoch Peserico , et al. · 2022

Computer science Mathematics Biology

The classical paging problem can be described as follows: given a cache that can hold up to k pages (or blocks) and a sequence of requests to pages, how should we manage the cache so as to maximize performance-or, in other words, complete …

Mitosis: Transparently Self-Replicating Page-Tables for Large-Memory Machines Open

Reto Achermann, Ashish Panwar, Abhishek Bhattacharjee, Timothy Roscoe, Jayneel Gandhi · 2020

Computer science Mathematics

This repository contains artifacts of the paper Mitosis: Transparently Self-Replicating Page-Tables for Large-Memory Machines by Reto Achermann, Jayneel Gandhi, Timothy Roscoe, Abhishek Bhattacharjee, and Ashish Panwar to appear in the 25t…

Demand paging