Marcin Copik
YOU?
Author Swipe
View article: XaaS Containers: Performance-Portable Representation With Source and IR Containers
XaaS Containers: Performance-Portable Representation With Source and IR Containers Open
High-performance computing (HPC) systems and cloud data centers are converging, and containers are becoming the default method of portable software deployment. Yet, while containers simplify software management, they face significant perfo…
View article: Confidential LLM Inference: Performance and Cost Across CPU and GPU TEEs
Confidential LLM Inference: Performance and Cost Across CPU and GPU TEEs Open
Large Language Models (LLMs) are increasingly deployed on converged Cloud and High-Performance Computing (HPC) infrastructure. However, as LLMs handle confidential inputs and are fine-tuned on costly, proprietary datasets, their heightened…
View article: AI Factories: It's time to rethink the Cloud-HPC divide
AI Factories: It's time to rethink the Cloud-HPC divide Open
The strategic importance of artificial intelligence is driving a global push toward Sovereign AI initiatives. Nationwide governments are increasingly developing dedicated infrastructures, called AI Factories (AIF), to achieve technological…
View article: DaCe AD: Unifying High-Performance Automatic Differentiation for Machine Learning and Scientific Computing
DaCe AD: Unifying High-Performance Automatic Differentiation for Machine Learning and Scientific Computing Open
Automatic differentiation (AD) is a set of techniques that systematically applies the chain rule to compute the gradients of functions without requiring human intervention. Although the fundamentals of this technology were established deca…
View article: Cppless: Single-Source and High-Performance Serverless Programming in C++
Cppless: Single-Source and High-Performance Serverless Programming in C++ Open
The rise of serverless computing introduced a new class of scalable, elastic, and widely available parallel workers in the cloud. Many systems and applications benefit from offloading computations and parallel tasks to dynamically allocate…
View article: Higher-Order Graph Databases
Higher-Order Graph Databases Open
Recent advances in graph databases (GDBs) have been driving interest in large-scale analytics, yet current systems fail to support higher-order (HO) interactions beyond first-order (one-hop) relations, which are crucial for tasks such as s…
View article: Affordable AI Assistants with Knowledge Graph of Thoughts
Affordable AI Assistants with Knowledge Graph of Thoughts Open
Large Language Models (LLMs) are revolutionizing the development of AI assistants capable of performing diverse tasks across domains. However, current state-of-the-art LLM-driven agents face significant challenges, including high operation…
View article: Reasoning Language Models: A Blueprint
Reasoning Language Models: A Blueprint Open
Reasoning language models (RLMs), also known as Large Reasoning Models (LRMs), such as OpenAI's o1 and o3, DeepSeek-R1, and Alibaba's QwQ, have redefined AI's problem-solving capabilities by extending LLMs with advanced reasoning mechanism…
View article: Core Hours and Carbon Credits: Incentivizing Sustainability in HPC
Core Hours and Carbon Credits: Incentivizing Sustainability in HPC Open
Realizing a shared responsibility between providers and consumers is critical to manage the sustainability of HPC. However, while cost may motivate efficiency improvements by infrastructure operators, broader progress is impeded by a lack …
View article: A Priori Loop Nest Normalization: Automatic Loop Scheduling in Complex Applications
A Priori Loop Nest Normalization: Automatic Loop Scheduling in Complex Applications Open
The same computations are often expressed differently across software projects and programming languages. In particular, how computations involving loops are expressed varies due to the many possibilities to permute and compose loops. Sinc…
View article: SeBS-Flow: Benchmarking Serverless Cloud Function Workflows
SeBS-Flow: Benchmarking Serverless Cloud Function Workflows Open
Serverless computing has emerged as a prominent paradigm, with a significant adoption rate among cloud customers. While this model offers advantages such as abstraction from the deployment and resource scheduling, it also poses limitations…
View article: XaaS: Acceleration as a Service to Enable Productive High-Performance Cloud Computing
XaaS: Acceleration as a Service to Enable Productive High-Performance Cloud Computing Open
High-performance computing (HPC) and the cloud have evolved independently, specializing their innovations into performance or productivity. Acceleration as a Service (XaaS) is a recipe to empower both fields with a shared execution platfor…
View article: Demystifying Chains, Trees, and Graphs of Thoughts
Demystifying Chains, Trees, and Graphs of Thoughts Open
The field of natural language processing (NLP) has witnessed significant progress in recent years, with a notable focus on improving large language models' (LLM) performance through innovative prompting techniques. Among these, prompt engi…
View article: Cppless: Single-Source and High-Performance Serverless Programming in C++
Cppless: Single-Source and High-Performance Serverless Programming in C++ Open
The rise of serverless computing introduced a new class of scalable, elastic and widely available parallel workers in the cloud. Many systems and applications benefit from offloading computations and parallel tasks to dynamically allocated…
View article: Software Resource Disaggregation for HPC with Serverless Computing
Software Resource Disaggregation for HPC with Serverless Computing Open
Aggregated HPC resources have rigid allocation systems and programming models which struggle to adapt to diverse and changing workloads. Consequently, HPC systems fail to efficiently use the large pools of unused memory and increase the ut…
View article: XaaS: Acceleration as a Service to Enable Productive High-Performance Cloud Computing
XaaS: Acceleration as a Service to Enable Productive High-Performance Cloud Computing Open
HPC and Cloud have evolved independently, specializing their innovations into performance or productivity. Acceleration as a Service (XaaS) is a recipe to empower both fields with a shared execution platform that provides transparent acces…
View article: User-guided Page Merging for Memory Deduplication in Serverless Systems
User-guided Page Merging for Memory Deduplication in Serverless Systems Open
Serverless computing is an emerging cloud paradigm that offers an elastic and scalable allocation of computing resources with pay-as-you-go billing. In the Function-as-a-Service (FaaS) programming model, applications comprise short-lived a…
View article: FMI: Fast and Cheap Message Passing for Serverless Functions
FMI: Fast and Cheap Message Passing for Serverless Functions Open
Serverless functions provide elastic scaling and a fine-grained billing model, making Function-as-a-Service (FaaS) an attractive programming model. However, for distributed jobs that benefit from large-scale and dynamic parallelism, the la…
View article: MOM: Matrix Operations in MLIR
MOM: Matrix Operations in MLIR Open
Modern research in code generators for dense linear algebra computations has shown the ability to produce optimized code with a performance which compares and often exceeds the one of state-of-the-art implementations by domain experts. How…
View article: FaaSKeeper: Learning from Building Serverless Services with ZooKeeper as an Example
FaaSKeeper: Learning from Building Serverless Services with ZooKeeper as an Example Open
FaaS (Function-as-a-Service) revolutionized cloud computing by replacing persistent virtual machines with dynamically allocated resources. This shift trades locality and statefulness for a pay-as-you-go model more suited to variable and in…
View article: SISA: Set-Centric Instruction Set Architecture for Graph Mining on Processing-in-Memory Systems
SISA: Set-Centric Instruction Set Architecture for Graph Mining on Processing-in-Memory Systems Open
Simple graph algorithms such as PageRank have been the target of numerous hardware accelerators. Yet, there also exist much more complex graph mining algorithms for problems such as clustering or maximal clique listing. These algorithms ar…
View article: SeBS: A Serverless Benchmark Suite for Function-as-a-Service Computing
SeBS: A Serverless Benchmark Suite for Function-as-a-Service Computing Open
This upload contains the software prototype, data, analysis scripts, and replication scripts for the paper "SeBS: A Serverless Benchmark Suite for Function-as-a-Service Computing" (ACM/IFIP Middleware 2021). With our artifact we provide th…
View article: SeBS: A Serverless Benchmark Suite for Function-as-a-Service Computing
SeBS: A Serverless Benchmark Suite for Function-as-a-Service Computing Open
This upload contains the software prototype, data, analysis scripts, and replication scripts for the paper "SeBS: A Serverless Benchmark Suite for Function-as-a-Service Computing" (ACM/IFIP Middleware 2021). With our artifact we provide th…
View article: SeBS: A Serverless Benchmark Suite for Function-as-a-Service Computing
SeBS: A Serverless Benchmark Suite for Function-as-a-Service Computing Open
This upload contains the software prototype, data, analysis scripts, and replication scripts for the paper "SeBS: A Serverless Benchmark Suite for Function-as-a-Service Computing" (ACM/IFIP Middleware 2021). With our artifact we provide th…
View article: Work-Stealing Prefix Scan: Addressing Load Imbalance in Large-Scale Image Registration
Work-Stealing Prefix Scan: Addressing Load Imbalance in Large-Scale Image Registration Open
Parallelism patterns (e.g., map or reduce) have proven to be effective tools for parallelizing high-performance applications. In this paper, we study the recursive registration of a series of electron microscopy images - a time consuming a…
View article: rFaaS: Enabling High Performance Serverless with RDMA and Leases
rFaaS: Enabling High Performance Serverless with RDMA and Leases Open
High performance is needed in many computing systems, from batch-managed supercomputers to general-purpose cloud platforms. However, scientific clusters lack elastic parallelism, while clouds cannot offer competitive costs for high-perform…
View article: GraphMineSuite: Enabling High-Performance and Programmable Graph Mining\n Algorithms with Set Algebra
GraphMineSuite: Enabling High-Performance and Programmable Graph Mining\n Algorithms with Set Algebra Open
We propose GraphMineSuite (GMS): the first benchmarking suite for graph\nmining that facilitates evaluating and constructing high-performance graph\nmining algorithms. First, GMS comes with a benchmark specification based on\nextensive lit…
View article: GraphMineSuite: Enabling High-Performance and Programmable Graph Mining Algorithms with Set Algebra
GraphMineSuite: Enabling High-Performance and Programmable Graph Mining Algorithms with Set Algebra Open
We propose GraphMineSuite (GMS): the first benchmarking suite for graph mining that facilitates evaluating and constructing high-performance graph mining algorithms. First, GMS comes with a benchmark specification based on extensive litera…