Explanipedia

Shift Parallelism: Low-Latency, High-Throughput LLM Inference for Dynamic Workloads Open

Mert Hidayetoğlu, Aurick Qiao, M. B. Wyatt, Jeff Rasley, Yuxiong He , et al. · 2025

Efficient parallelism is necessary for achieving low-latency, high-throughput inference with large language models (LLMs). Tensor parallelism (TP) is the state-of-the-art method for reducing LLM response latency, however GPU communications…

Arctic Inference with Shift Parallelism: Fast and Efficient Open Source Inference System for Enterprise AI Open

Samyam Rajbhandari, Mert Hidayetoğlu, Aurick Qiao, Ye Wang, Juncheng Yang , et al. · 2025

Inference is now the dominant AI workload, yet existing systems force trade-offs between latency, throughput, and cost. Arctic Inference, an open-source vLLM plugin from Snowflake AI Research, introduces Shift Parallelism, a dynamic parall…

SSDTrain: An Activation Offloading Framework to SSDs for Faster Large Language Model Training Open

Kai Wu, Jeongmin Park, Xiaofan Zhang, Mert Hidayetoğlu, Vikram Sharma Mailthody , et al. · 2024

Computer science Physics

The growth rate of the GPU memory capacity has not been able to keep up with that of the size of large language models (LLMs), hindering the model training process. In particular, activations -- the intermediate tensors produced during for…

HiCCL: A Hierarchical Collective Communication Library Open

Mert Hidayetoğlu, Simon Garcia de Gonzalo, Elliott Slaughter, Pinku Surana, Wen‐mei Hwu , et al. · 2024

Computer science Business

HiCCL (Hierarchical Collective Communication Library) addresses the growing complexity and diversity in high-performance network architectures. As GPU systems have envolved into networks of GPUs with different multilevel communication hier…

CommBench: Micro-Benchmarking Hierarchical Networks with Multi-GPU, Multi-NIC Nodes Open

Mert Hidayetoğlu, Simon Garcia de Gonzalo, Elliott Slaughter, Yu Li, Christopher Zimmer , et al. · 2024

Computer science Engineering Business

Modern high-performance computing systems have multiple GPUs and network interface cards (NICs) per node. The resulting network architectures have multilevel hierarchies of subnetworks with different interconnect and software technologies.…

Hector: An Efficient Programming and Compilation Framework for Implementing Relational Graph Neural Networks in GPU Architectures Open

Kun Wu, Mert Hidayetoğlu, Xiang Song, Sitao Huang, Da Zheng , et al. · 2023

Computer science

Relational graph neural networks (RGNNs) are graph neural networks with dedicated structures for modeling the different types of nodes and edges in heterogeneous graphs. While RGNNs have been increasingly adopted in many real-world applica…

Fast Numerical Integration Techniques for 2.5-Dimensional Inverse Problems Open

Mert Hidayetoğlu, Michael L. Oelze, Erhan Kudeki, Weng Cho Chew · 2022

Computer science Mathematics Economics

Inverse scattering involving microwave and ultrasound waves require numerical solution of nonlinear optimization problem. To alleviate the computational burden of a full three-dimensional (3-D) inverse problem, it is a common practice to a…

MemXCT: Design, Optimization, Scaling, and Reproducibility of X-Ray Tomography Imaging Open

Mert Hidayetoğlu, Tekin Biçer, Simon Garcia de Gonzalo, Bin Ren, Doğa Gürsoy , et al. · 2021

Computer science

Here, this work extends our previous research entitled "MemXCT: Memory-centric X-ray CT Reconstruction with Massive Parallelization" that was originally published at SC19 conference (Hidayetoglu et al., 2019) with reproducibility of the co…

Graph Neural Network Training with Data Tiering Open

Seungwon Min, Kun Wu, Mert Hidayetoğlu, Jinjun Xiong, Xiang Song , et al. · 2021

Computer science

Graph Neural Networks (GNNs) have shown success in learning from graph-structured data, with applications to fraud detection, recommendation, and knowledge graph reasoning. However, training GNN efficiently is challenging because: 1) GPU m…

Large graph convolutional network training with GPU-oriented data communication architecture Open

Seung Won Min, Kun Wu, Sitao Huang, Mert Hidayetoğlu, Jinjun Xiong , et al. · 2021

Computer science

Graph Convolutional Networks (GCNs) are increasingly adopted in large-scale graph-based recommender systems. Training GCN requires the minibatch generator traversing graphs and sampling the sparsely located neighboring nodes to obtain thei…

Large Graph Convolutional Network Training with GPU-Oriented Data\n Communication Architecture Open

Seung Won Min, Wu Kun, Sitao Huang, Mert Hidayetoğlu, Jinjun Xiong , et al. · 2021

Computer science

Graph Convolutional Networks (GCNs) are increasingly adopted in large-scale\ngraph-based recommender systems. Training GCN requires the minibatch generator\ntraversing graphs and sampling the sparsely located neighboring nodes to obtain\nt…

PyTorch-Direct: Enabling GPU Centric Data Access for Very Large Graph Neural Network Training with Irregular Accesses Open

Seungwon Min, Kun Wu, Sitao Huang, Mert Hidayetoğlu, Jinjun Xiong , et al. · 2021

Computer science

With the increasing adoption of graph neural networks (GNNs) in the machine learning community, GPUs have become an essential tool to accelerate GNN training. However, training GNNs on very large graphs that do not fit in GPU memory is sti…

PyTorch-Direct: Enabling GPU Centric Data Access for Very Large Graph\n Neural Network Training with Irregular Accesses Open

Seung Won Min, Wu Kun, Sitao Huang, Mert Hidayetoğlu, Jinjun Xiong , et al. · 2021

Computer science

With the increasing adoption of graph neural networks (GNNs) in the machine\nlearning community, GPUs have become an essential tool to accelerate GNN\ntraining. However, training GNNs on very large graphs that do not fit in GPU\nmemory is …

Petascale XCT: 3D Image Reconstruction with Hierarchical Communications on Multi-GPU Nodes Open

Mert Hidayetoğlu, Tekin Biçer, Simon Garcia de Gonzalo, Bin Ren, Vincent De Andrade , et al. · 2020

Computer science

X-ray computed tomography is a commonly used technique for noninvasive imaging at synchrotron facilities. Iterative tomographic reconstruction algorithms are often preferred for recovering high quality 3D volumetric images from 2D X-ray im…

At-Scale Sparse Deep Neural Network Inference With Efficient GPU Implementation Open

Mert Hidayetoğlu, Carl Pearson, Vikram Sharma Mailthody, Eiman Ebrahimi, Jinjun Xiong , et al. · 2020

Computer science Physics Mathematics

This paper presents GPU performance optimization and scaling results for inference models of the Sparse Deep Neural Network Challenge 2020. Demands for network quality have increased rapidly, pushing the size and thus the memory requiremen…

Efficient Inference on GPUs for the Sparse Deep Neural Network Graph Challenge 2020 Open

Mert Hidayetoğlu, Carl Pearson, Vikram Sharma Mailthody, Eiman Ebrahimi, Jinjun Xiong , et al. · 2020

Computer science Mathematics Physics

This paper presents GPU performance optimization and scaling results for the Sparse Deep Neural Network Challenge 2020. Demands for network quality have increased rapidly, pushing the size and thus the memory requirements of many neural ne…

MemXCT Open

Mert Hidayetoğlu, Tekin Biçer, Simon Garcia de Gonzalo, Bin Ren, Doğa Gürsoy , et al. · 2019

Computer science Physics

X-ray computed tomography (XCT)is used regularly at synchrotron light sources to study the internal morphology of materials at high resolution. However, experimental constraints, such as radiation sensitivity, can result in noisy or unders…

Rotation-as-fast-axis scanning-probe x-ray tomography: the importance of angular diversity for fly-scan modes Open

Daniel J. Ching, Mert Hidayetoğlu, Tekin Biçer, Doğa Gürsoy · 2018

Physics Computer science

We investigate the effects of angular diversity on image-reconstruction quality of scanning-probe x-ray tomography for both fly- and step-mode data collection. We propose probe-coverage maps as a tool for both visualizing and quantifying t…

Mert Hidayetoğlu YOU? Author Swipe