Marcin Chrapek
YOU?
Author Swipe
View article: Confidential LLM Inference: Performance and Cost Across CPU and GPU TEEs
Confidential LLM Inference: Performance and Cost Across CPU and GPU TEEs Open
Large Language Models (LLMs) are increasingly deployed on converged Cloud and High-Performance Computing (HPC) infrastructure. However, as LLMs handle confidential inputs and are fine-tuned on costly, proprietary datasets, their heightened…
View article: SDR-RDMA: Software-Defined Reliability Architecture for Planetary Scale RDMA Communication
SDR-RDMA: Software-Defined Reliability Architecture for Planetary Scale RDMA Communication Open
RDMA is vital for efficient distributed training across datacenters, but millisecond-scale latencies complicate the design of its reliability layer. We show that depending on long-haul link characteristics, such as drop rate, distance and …
View article: Fortify Your Foundations: Practical Privacy and Security for Foundation Model Deployments In The Cloud
Fortify Your Foundations: Practical Privacy and Security for Foundation Model Deployments In The Cloud Open
Foundation Models (FMs) display exceptional performance in tasks such as natural language processing and are being applied across a growing range of disciplines. Although typically trained on large public datasets, FMs are often fine-tuned…
View article: Network-Offloaded Bandwidth-Optimal Broadcast and Allgather for Distributed AI
Network-Offloaded Bandwidth-Optimal Broadcast and Allgather for Distributed AI Open
In the Fully Sharded Data Parallel (FSDP) training pipeline, collective operations can be interleaved to maximize the communication/computation overlap. In this scenario, outstanding operations such as Allgather and Reduce-Scatter can comp…
View article: Multi-Head RAG: Solving Multi-Aspect Problems with LLMs
Multi-Head RAG: Solving Multi-Aspect Problems with LLMs Open
Retrieval Augmented Generation (RAG) enhances the abilities of Large Language Models (LLMs) by enabling the retrieval of documents into the LLM context to provide more accurate and relevant responses. Existing RAG solutions do not focus on…
View article: LLAMP: Assessing Network Latency Tolerance of HPC Applications with Linear Programming
LLAMP: Assessing Network Latency Tolerance of HPC Applications with Linear Programming Open
The shift towards high-bandwidth networks driven by AI workloads in data centers and HPC clusters has unintentionally aggravated network latency, adversely affecting the performance of communication-intensive HPC applications. As large-sca…
View article: Software Resource Disaggregation for HPC with Serverless Computing
Software Resource Disaggregation for HPC with Serverless Computing Open
Aggregated HPC resources have rigid allocation systems and programming models which struggle to adapt to diverse and changing workloads. Consequently, HPC systems fail to efficiently use the large pools of unused memory and increase the ut…
View article: OSMOSIS: Enabling Multi-Tenancy in Datacenter SmartNICs
OSMOSIS: Enabling Multi-Tenancy in Datacenter SmartNICs Open
Multi-tenancy is essential for unleashing SmartNIC's potential in datacenters. Our systematic analysis in this work shows that existing on-path SmartNICs have resource multiplexing limitations. For example, existing solutions lack multi-te…
View article: The saphenous vein harvest procedure affects the arteriovenous system and postoperative wound healing in patients following coronary aortic bypass surgery
The saphenous vein harvest procedure affects the arteriovenous system and postoperative wound healing in patients following coronary aortic bypass surgery Open
ENWEndNote BIBJabRef, Mendeley RISPapers, Reference Manager, RefWorks, Zotero AMA Froń K, Chrapek M, Bratkowski W, Ruci O, Pacholewicz J. The saphenous vein harvest procedure affects the arteriovenous system and postoperative wound healing…