Explanipedia

Multi-Agent Code-Orchestrated Generation for Reliable Infrastructure-as-Code Open

Rana Nameer Hussain Khan, Dawood Wasif, Jin-Hee Cho, Ali R. Butt · 2025

The increasing complexity of cloud-native infrastructure has made Infrastructure-as-Code (IaC) essential for reproducible and scalable deployments. While large language models (LLMs) have shown promise in generating IaC snippets from natur…

User-based I/O Profiling for Leadership Scale HPC Workloads Open

Ahmad Hossein Yazdani, Arnab K. Paul, Ahmad Maroof Karimi, Feiyi Wang, Ali R. Butt · 2025

Computer science

I/O constitutes a significant portion of most of the application runtime. Spawning many such applications concurrently on an HPC system leads to severe I/O contention. Thus, understanding and subsequently reducing I/O contention induced by…

Ensuring Fair LLM Serving Amid Diverse Applications Open

Redwan Ibne Seraj Khan, Kunal Jain, Haiying Shen, Ankur Mallick, Anjaly Parayil , et al. · 2024

Computer science Business

In a multi-tenant large language model (LLM) serving platform hosting diverse applications, some users may submit an excessive number of requests, causing the service to become unavailable to other users and creating unfairness. Existing f…

CIWARS: a web server for waterborne antibiotic resistance surveillance using longitudinal metagenomic data Open

Muhit Islam Emon, Yat Fei Cheung, James Stoll, Monjura Afrin Rumi, Connor Brown , et al. · 2024

Computer science Biology

The rise of antibiotic resistance (AR) is a major global health crisis, exacerbated by the overuse and misuse of antibiotics, leading to the rapid spread of antibiotic resistance genes (ARGs) in bacterial pathogens. This phenomenon poses s…

FLOAT: Federated Learning Optimizations with Automated Tuning Open

Ahmad Faraz Khan, Azal Ahmad Khan, Ahmed M. Abdelmoniem, Samuel Fountain, Ali R. Butt , et al. · 2024

Computer science Engineering

Federated Learning (FL) has emerged as a powerful approach that enables collaborative distributed model training without the need for data sharing. However, FL grapples with inherent heterogeneity challenges leading to issues such as strag…

Tarazu: An Adaptive End-to-end I/O Load-balancing Framework for Large-scale Parallel File Systems Open

Arnab K. Paul, Sarah Neuwirth, Bharti Wadhwa, Feiyi Wang, Sarp Oral , et al. · 2024

Computer science Mathematics Physics

The imbalanced I/O load on large parallel file systems affects the parallel I/O performance of high-performance computing (HPC) applications. One of the main reasons for I/O imbalances is the lack of a global view of system-wide resource c…

An End-to-end High-performance Deduplication Scheme for Docker Registries and Docker Container Storage Systems Open

Nannan Zhao, Muhui Lin, Hadeel Albahar, Arnab K. Paul, Zhijie Huan , et al. · 2024

Computer science Materials science Mathematics

The wide adoption of Docker containers for supporting agile and elastic enterprise applications has led to a broad proliferation of container images. The associated storage performance and capacity requirements place a high pressure on the…

Towards Persistent Memory based Stateful Serverless Computing for Big Data Applications Open

Yuze Li, Kevin Assogba, Abhijit Tripathy, Moiz Arif, M. Mustafa Rafique , et al. · 2023

Computer science

The Function-as-a-service (FaaS) computing model has recently seen significant growth especially for highly scalable, event-driven applications. The easy-to-deploy and cost-efficient fine-grained billing of FaaS is highly attractive to big…

A Survey on Attacks and Their Countermeasures in Deep Learning: Applications in Deep Neural Networks, Federated, Transfer, and Deep Reinforcement Learning Open

Haider Ali, Dian Chen, Matthew Harrington, Nathaniel Salazar, Mohannad Al Ameedi , et al. · 2023

Computer science

Deep Learning (DL) techniques are being used in various critical applications like self-driving cars. DL techniques such as Deep Neural Networks (DNN), Deep Reinforcement Learning (DRL), Federated Learning (FL), and Transfer Learning (TL) …

Towards cost-effective and resource-aware aggregation at Edge for Federated Learning Open

Ahmad Khan, Yuze Li, Ali Anwar, Yue Cheng, Thang Hoang , et al. · 2022

Computer science Economics

Federated Learning (FL) is a machine learning approach that addresses privacy and data transfer costs by computing data at the source. It's particularly popular for Edge and IoT applications where the aggregator server of FL is in resource…

An Analysis of System Balance and Architectural Trends Based on Top500 Supercomputers Open

Awais Khan, Hyogi Sim, Sudharshan S. Vazhkudai, Ali R. Butt, Youngjae Kim · 2021

Computer science Engineering Economics

Supercomputer design is a complex, multi-dimensional optimization process, wherein several subsystems need to be reconciled to meet a desired figure of merit performance for a portfolio of applications and a budget constraint. However, ove…

Prediction of high-performance computing input/output variability and its application to optimization for system configurations Open

Li Xu, Thomas Lux, Tyler H. Chang, Bo Li, Yili Hong , et al. · 2021

Computer science

Performance variability is an important measure for a reliable high performance computing (HPC) system. Performance variability is affected by complicated interactions between numerous factors, such as CPU frequency, the number of input/ou…

Prediction of High-Performance Computing Input/Output Variability and Its Application to Optimization for System Configurations Open

Li Xu, Thomas Lux, Tyler H. Chang, Bo Li, Yili Hong , et al. · 2020

Computer science Physics Mathematics

Performance variability is an important measure for a reliable high performance computing (HPC) system. Performance variability is affected by complicated interactions between numerous factors, such as CPU frequency, the number of input/ou…

Understanding HPC Application I/O Behavior Using System Level Statistics Open

Arnab K. Paul, Olaf Faaland, Adam Moody, Elsa Gonsiorowski, Kathryn Mohror , et al. · 2020

Computer science

The processor performance of high performance computing (HPC) systems is increasing at a much higher rate than storage performance. This imbalance leads to I/O performance bottlenecks in massively parallel HPC applications. Therefore, ther…

Algorithm 1012 Open

Tyler H. Chang, Layne T. Watson, Thomas Lux, Ali R. Butt, Kirk W. Cameron , et al. · 2020

Computer science Mathematics

DELAUNAYSPARSE contains both serial and parallel codes written in Fortran 2003 (with OpenMP) for performing medium- to high-dimensional interpolation via the Delaunay triangulation. To accommodate the exponential growth in the size of the …

MARBLE: A Multi-GPU Aware Job Scheduler for Deep Learning on HPC Systems Open

Jingoo Han, M. Mustafa Rafique, Luna Xu, Ali R. Butt, Seung–Hwan Lim , et al. · 2020

Computer science Philosophy Economics

Deep learning (DL) has become a key tool for solving complex scientific problems. However, managing the multi-dimensional large-scale data associated with DL, especially atop extant multiple graphics processing units (GPUs) in modern super…

An Integrated Indexing and Search Service for Distributed File Systems Open

Hyogi Sim, Awais Khan, Sudharshan S. Vazhkudai, Seung–Hwan Lim, Ali R. Butt , et al. · 2020

Computer science

Data services such as search, discovery, and management in scalable distributed environments have traditionally been decoupled from the underlying file systems, and are often deployed using external databases and indexing services. However…

Customizable Scale-Out Key-Value Stores Open

Ali Anwar, Yue Cheng, Hai Huang, Jingoo Han, Hyogi Sim , et al. · 2020

Computer science Mathematics

Enterprise KV stores are often not well suited for HPC applications, and thus cumbersome end-to-end KV design customization is required to meet the needs of modern HPC applications. To this end, in this article we present bespoKV, an adapt…

A Quantitative Study of Deep Learning Training on Heterogeneous Supercomputers Open

Jingoo Han, Luna Xu, M. Mustafa Rafique, Ali R. Butt, Seung–Hwan Lim · 2019

Computer science Physics Economics

Deep learning (DL) has become a key technique for solving complex problems in scientific research and discovery. DL training for science is substantially challenging because it has to deal with massive quantities of multi-dimensional data.…

iez: Resource Contention Aware Load Balancing for Large-Scale Parallel File Systems Open

Bharti Wadhwa, Arnab K. Paul, Sarah Neuwirth, Feiyi Wang, Sarp Oral , et al. · 2019

Computer science Geography Mathematics

Parallel I/O performance is crucial to sustaining scientific applications on large-scale High-Performance Computing (HPC) systems. However, I/O load imbalance in the underlying distributed and shared storage systems can significantly reduc…

BESPOKV: Application Tailored Scale-Out Key-Value Stores Open

Ali Anwar, Yue Cheng, Hai Huang, Jingoo Han, Hyogi Sim , et al. · 2018

Computer science

Enterprise KV stores are not well suited for HPC applications, and entail customization and cumbersome end-to-end KV design to extract the HPC application needs. To this end, in this paper we present BESPOKV, an adaptive, extensible, and s…

A Heterogeneity-Aware Task Scheduler for Spark Open

Luna Xu, Ali R. Butt, Seung–Hwan Lim, Kannan Ramakrishnan · 2018

Computer science Economics Philosophy

Big data processing systems such as Spark are employed in an increasing number of diverse applications—such as machine learning, graph computation, and scientific computing—each with dynamic and different resource needs. These applications…

An Analysis Workflow-Aware Storage System for Multi-Core Active Flash Arrays Open

Hyogi Sim, Geoffroy Vallée, Youngjae Kim, Sudharshan S. Vazhkudai, Devesh Tiwari , et al. · 2018

Computer science Engineering Economics

Here, the need for novel data analysis is urgent in the face of a data deluge from modern applications. Traditional approaches to data analysis incur significant data movement costs, moving data back and forth between the storage system an…

Sizing Buffers of IoT Edge Routers Open

Jamal Ahmad Khan, Muhammad Shahzad, Ali R. Butt · 2018

Computer science Chemistry

In typical IoT systems, sensors and actuators are connected to small embedded computers, called IoT devices, and the IoT devices are connected to one or more appropriate cloud services over the internet through an edge access router. A ver…

Chameleon: An Adaptive Wear Balancer for Flash Clusters Open

Nannan Zhao, Ali Anware, Yue Cheng, Salman Mohammed, Daping Li , et al. · 2018

Computer science Engineering Mathematics

NAND flash-based Solid State Devices (SSDs) offer the desirable features of high performance, energy efficiency, and fast growing capacity. Thus, the use of SSDs is increasing in distributed storage systems. A key obstacle in this context …

Toward Transparent Data Management in Multi-Layer Storage Hierarchy of HPC Systems Open

Bharti Wadhwa, Suren Byna, Ali R. Butt · 2018

Computer science Economics

Upcoming exascale high performance computing (HPC) systems are expected to comprise multi-tier storage hierarchy, and thus will necessitate innovative storage and I/O mechanisms. Traditional disk and block-based interfaces and file systems…

Scaling up data-parallel analytics platforms: Linear algebraic operation cases Open

Luna Xu, Seung–Hwan Lim, Min Li, Ali R. Butt, Ramakrishnan Kannan · 2017

Computer science Mathematics Physics

Linear algebraic operations such as matrix manipulations form the kernel of many machine learning and other crucial algorithms. Scaling up as well as scaling out such algorithms are key to supporting large scale data analysis that require …

Ali R. Butt YOU? Author Swipe