Mert Hidayetoğlu
YOU?
Author Swipe
View article: Shift Parallelism: Low-Latency, High-Throughput LLM Inference for Dynamic Workloads
Shift Parallelism: Low-Latency, High-Throughput LLM Inference for Dynamic Workloads Open
Efficient parallelism is necessary for achieving low-latency, high-throughput inference with large language models (LLMs). Tensor parallelism (TP) is the state-of-the-art method for reducing LLM response latency, however GPU communications…
View article: Arctic Inference with Shift Parallelism: Fast and Efficient Open Source Inference System for Enterprise AI
Arctic Inference with Shift Parallelism: Fast and Efficient Open Source Inference System for Enterprise AI Open
Inference is now the dominant AI workload, yet existing systems force trade-offs between latency, throughput, and cost. Arctic Inference, an open-source vLLM plugin from Snowflake AI Research, introduces Shift Parallelism, a dynamic parall…
View article: SSDTrain: An Activation Offloading Framework to SSDs for Faster Large Language Model Training
SSDTrain: An Activation Offloading Framework to SSDs for Faster Large Language Model Training Open
The growth rate of the GPU memory capacity has not been able to keep up with that of the size of large language models (LLMs), hindering the model training process. In particular, activations -- the intermediate tensors produced during for…
View article: HiCCL: A Hierarchical Collective Communication Library
HiCCL: A Hierarchical Collective Communication Library Open
HiCCL (Hierarchical Collective Communication Library) addresses the growing complexity and diversity in high-performance network architectures. As GPU systems have envolved into networks of GPUs with different multilevel communication hier…
View article: CommBench: Micro-Benchmarking Hierarchical Networks with Multi-GPU, Multi-NIC Nodes
CommBench: Micro-Benchmarking Hierarchical Networks with Multi-GPU, Multi-NIC Nodes Open
Modern high-performance computing systems have multiple GPUs and network interface cards (NICs) per node. The resulting network architectures have multilevel hierarchies of subnetworks with different interconnect and software technologies.…
View article: Hector: An Efficient Programming and Compilation Framework for Implementing Relational Graph Neural Networks in GPU Architectures
Hector: An Efficient Programming and Compilation Framework for Implementing Relational Graph Neural Networks in GPU Architectures Open
Relational graph neural networks (RGNNs) are graph neural networks with dedicated structures for modeling the different types of nodes and edges in heterogeneous graphs. While RGNNs have been increasingly adopted in many real-world applica…
View article: Fast Numerical Integration Techniques for 2.5-Dimensional Inverse Problems
Fast Numerical Integration Techniques for 2.5-Dimensional Inverse Problems Open
Inverse scattering involving microwave and ultrasound waves require numerical solution of nonlinear optimization problem. To alleviate the computational burden of a full three-dimensional (3-D) inverse problem, it is a common practice to a…
View article: MemXCT: Design, Optimization, Scaling, and Reproducibility of X-Ray Tomography Imaging
MemXCT: Design, Optimization, Scaling, and Reproducibility of X-Ray Tomography Imaging Open
Here, this work extends our previous research entitled "MemXCT: Memory-centric X-ray CT Reconstruction with Massive Parallelization" that was originally published at SC19 conference (Hidayetoglu et al., 2019) with reproducibility of the co…
View article: Graph Neural Network Training with Data Tiering
Graph Neural Network Training with Data Tiering Open
Graph Neural Networks (GNNs) have shown success in learning from graph-structured data, with applications to fraud detection, recommendation, and knowledge graph reasoning. However, training GNN efficiently is challenging because: 1) GPU m…
View article: Large graph convolutional network training with GPU-oriented data communication architecture
Large graph convolutional network training with GPU-oriented data communication architecture Open
Graph Convolutional Networks (GCNs) are increasingly adopted in large-scale graph-based recommender systems. Training GCN requires the minibatch generator traversing graphs and sampling the sparsely located neighboring nodes to obtain thei…
View article: Large Graph Convolutional Network Training with GPU-Oriented Data\n Communication Architecture
Large Graph Convolutional Network Training with GPU-Oriented Data\n Communication Architecture Open
Graph Convolutional Networks (GCNs) are increasingly adopted in large-scale\ngraph-based recommender systems. Training GCN requires the minibatch generator\ntraversing graphs and sampling the sparsely located neighboring nodes to obtain\nt…
View article: PyTorch-Direct: Enabling GPU Centric Data Access for Very Large Graph Neural Network Training with Irregular Accesses
PyTorch-Direct: Enabling GPU Centric Data Access for Very Large Graph Neural Network Training with Irregular Accesses Open
With the increasing adoption of graph neural networks (GNNs) in the machine learning community, GPUs have become an essential tool to accelerate GNN training. However, training GNNs on very large graphs that do not fit in GPU memory is sti…
View article: PyTorch-Direct: Enabling GPU Centric Data Access for Very Large Graph\n Neural Network Training with Irregular Accesses
PyTorch-Direct: Enabling GPU Centric Data Access for Very Large Graph\n Neural Network Training with Irregular Accesses Open
With the increasing adoption of graph neural networks (GNNs) in the machine\nlearning community, GPUs have become an essential tool to accelerate GNN\ntraining. However, training GNNs on very large graphs that do not fit in GPU\nmemory is …
View article: Petascale XCT: 3D Image Reconstruction with Hierarchical Communications on Multi-GPU Nodes
Petascale XCT: 3D Image Reconstruction with Hierarchical Communications on Multi-GPU Nodes Open
X-ray computed tomography is a commonly used technique for noninvasive imaging at synchrotron facilities. Iterative tomographic reconstruction algorithms are often preferred for recovering high quality 3D volumetric images from 2D X-ray im…
View article: At-Scale Sparse Deep Neural Network Inference With Efficient GPU Implementation
At-Scale Sparse Deep Neural Network Inference With Efficient GPU Implementation Open
This paper presents GPU performance optimization and scaling results for inference models of the Sparse Deep Neural Network Challenge 2020. Demands for network quality have increased rapidly, pushing the size and thus the memory requiremen…
View article: Efficient Inference on GPUs for the Sparse Deep Neural Network Graph Challenge 2020
Efficient Inference on GPUs for the Sparse Deep Neural Network Graph Challenge 2020 Open
This paper presents GPU performance optimization and scaling results for the Sparse Deep Neural Network Challenge 2020. Demands for network quality have increased rapidly, pushing the size and thus the memory requirements of many neural ne…
View article: MemXCT
MemXCT Open
X-ray computed tomography (XCT)is used regularly at synchrotron light sources to study the internal morphology of materials at high resolution. However, experimental constraints, such as radiation sensitivity, can result in noisy or unders…
View article: Rotation-as-fast-axis scanning-probe x-ray tomography: the importance of angular diversity for fly-scan modes
Rotation-as-fast-axis scanning-probe x-ray tomography: the importance of angular diversity for fly-scan modes Open
We investigate the effects of angular diversity on image-reconstruction quality of scanning-probe x-ray tomography for both fly- and step-mode data collection. We propose probe-coverage maps as a tool for both visualizing and quantifying t…