Dhabaleswar K. Panda
YOU?
Author Swipe
View article: Characterizing Communication Patterns in Distributed Large Language Model Inference
Characterizing Communication Patterns in Distributed Large Language Model Inference Open
Large Language Models (LLMs) built on transformer architectures have transformed natural language processing, achieving remarkable performance across diverse applications. While distributed inference frameworks enable practical deployment …
View article: Molecular profiling of zoonotic hookworms infecting wild felids in northern Indiap
Molecular profiling of zoonotic hookworms infecting wild felids in northern Indiap Open
Introduction: Hookworms are one of the most common soil-transmitted helminths, which generally inhabit the small intestine of a variety of domestic and wild animals. Due to the conservation of wild felids, a limited studies have been condu…
View article: Accelerating Large Language Model Training with Hybrid GPU-based Compression
Accelerating Large Language Model Training with Hybrid GPU-based Compression Open
Data Parallelism (DP), Tensor Parallelism (TP), and Pipeline Parallelism (PP) are the three strategies widely adopted to enable fast and efficient Large Language Model (LLM) training. However, these approaches rely on data-intensive commun…
View article: Demystifying the Communication Characteristics for Distributed Transformer Models
Demystifying the Communication Characteristics for Distributed Transformer Models Open
Deep learning (DL) models based on the transformer architecture have revolutionized many DL applications such as large language models (LLMs), vision transformers, audio generation, and time series prediction. Much of this progress has bee…
View article: Accelerating MPI AllReduce Communication with Efficient GPU-Based Compression Schemes on Modern GPU Clusters
Accelerating MPI AllReduce Communication with Efficient GPU-Based Compression Schemes on Modern GPU Clusters Open
With the increasing scale of High-Performance Computing (HPC) and Deep Learning (DL) applications through GPU adaptation, the seamless communication of data stored on GPUs has become a critical factor in enhancing overall application perfo…
View article: Creating intelligent cyberinfrastructure for democratizing AI
Creating intelligent cyberinfrastructure for democratizing AI Open
Artificial intelligence (AI) has the potential for vast societal and economic gain; yet applications are developed in a largely ad hoc manner, lacking coherent, standardized, modular, and reusable infrastructures. The NSF‐funded Intelligen…
View article: The Case for Co-Designing Model Architectures with Hardware
The Case for Co-Designing Model Architectures with Hardware Open
While GPUs are responsible for training the vast majority of state-of-the-art deep learning models, the implications of their architecture are often overlooked when designing new deep learning (DL) models. As a consequence, modifying a DL …
View article: How to Educate HPC-Enabled AI and Data Science to Students and Professionals in a Holistic Manner?
How to Educate HPC-Enabled AI and Data Science to Students and Professionals in a Holistic Manner? Open
The fields of AI (including Machine Learning (ML) and Deep Learning (DL) and Data Science are rapidly evolving. The effective development and usage of many models and the associated inference schemes depend on a good understanding of the u…
View article: Tutorials
Tutorials Open
Recent advances in Machine and Deep Learning (ML/DL) have led to many exciting challenges and opportunities.Modern ML/DL and Data Science frameworks including TensorFlow, PyTorch, and Dask have emerged that offer high-performance training …
View article: Optimizing Amber for Device-to-Device GPU Communication
Optimizing Amber for Device-to-Device GPU Communication Open
Although direct GPU-to-GPU communication has been possible in MPI libraries for over a decade, the limited availability of compatible hardware at academic HPC centers has discouraged the development of algorithms in scientific applications…
View article: MCR-DL: Mix-and-Match Communication Runtime for Deep Learning
MCR-DL: Mix-and-Match Communication Runtime for Deep Learning Open
In recent years, the training requirements of many state-of-the-art Deep Learning (DL) models have scaled beyond the compute and memory capabilities of a single processor, and necessitated distribution among processors. Training such massi…
View article: Lightning Talks of EduHPC 2022
Lightning Talks of EduHPC 2022 Open
The lightning talks at EduHPC provide an opportunity to share early results and insights on parallel and distributed computing (PDC) education and training efforts. The four lightning talks at EduHPC 2022 cover a range of topics in broaden…
View article: High-Performance Big Data Computing
High-Performance Big Data Computing Open
An in-depth overview of an emerging field that brings together high-performance computing, big data processing, and deep learning. Over the last decade, the exponential explosion of data known as big data has changed the way we understand …
View article: Supercomputing Frontiers
Supercomputing Frontiers Open
As the share of supercomputers in Asia continues to increase, the relevance of supercomputing merits a supercomputing conference for Asia.Supercomputing Asia 2022 (SCA22) was an umbrella of notable supercomputing events that promoted a vib…
View article: Annual Progress Report [The Ohio State University LLNL Subcontract: B643967]
Annual Progress Report [The Ohio State University LLNL Subcontract: B643967] Open
at the drip line. To achieve this goal, we build upon a promising technique emerged recently as a candidate to reach a fundamental description of low-energy binary reactions between light ions, that is the ab initio no-core shell model c…
View article: OMB-Py: Python Micro-Benchmarks for Evaluating Performance of MPI Libraries on HPC Systems
OMB-Py: Python Micro-Benchmarks for Evaluating Performance of MPI Libraries on HPC Systems Open
Python has become a dominant programming language for emerging areas like Machine Learning (ML), Deep Learning (DL), and Data Science (DS). An attractive feature of Python is that it provides easy-to-use programming interface while allowin…
View article: Cross-layer Visualization and Profiling of Network and I/O Communication for HPC Clusters
Cross-layer Visualization and Profiling of Network and I/O Communication for HPC Clusters Open
Understanding and visualizing the full-stack performance trade-offs and interplay between HPC applications, MPI libraries, the communication fabric, and the file system is a challenging endeavor. Designing a holistic profiling and visualiz…
View article: Cross-layer Visualization and Profiling of Network and I/O Communication\n for HPC Clusters
Cross-layer Visualization and Profiling of Network and I/O Communication\n for HPC Clusters Open
Understanding and visualizing the full-stack performance trade-offs and\ninterplay between HPC applications, MPI libraries, the communication fabric,\nand the file system is a challenging endeavor. Designing a holistic profiling\nand visua…
View article: INAM: Cross-stack Profiling and Analysis of Communication in MPI-based Applications
INAM: Cross-stack Profiling and Analysis of Communication in MPI-based Applications Open
Understanding the full-stack performance trade-offs and interplay among HPC applications, MPI libraries, the communication fabric, and the job scheduler is a challenging endeavor. Unfortunately, existing profiling tools are disjoint and on…
View article: Efficient MPI-based Communication for GPU-Accelerated Dask Applications
Efficient MPI-based Communication for GPU-Accelerated Dask Applications Open
Dask is a popular parallel and distributed computing framework, which rivals Apache Spark to enable task-based scalable processing of big data. The Dask Distributed library forms the basis of this computing engine and provides support for …
View article: 27th IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC 2020) Technical program
27th IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC 2020) Technical program Open
HiPC 2020 is the 27th edition of the IEEE International Conference on High Performance Computing, Data, and Analytics.The conference focus is not only HPC but also includes Data Science.Due to the COVID-19 pandemic, this year the conferenc…
View article: Future Directions of the Cyberinfrastructure for Sustained Scientific Innovation (CSSI) Program
Future Directions of the Cyberinfrastructure for Sustained Scientific Innovation (CSSI) Program Open
The CSSI 2019 workshop was held on October 28-29, 2019, in Austin, Texas. The main objectives of this workshop were to (1) understand the impact of the CSSI program on the community over the last 9 years, (2) engage workshop participants i…
View article: Future Directions of the Cyberinfrastructure for Sustained Scientific\n Innovation (CSSI) Program
Future Directions of the Cyberinfrastructure for Sustained Scientific\n Innovation (CSSI) Program Open
The CSSI 2019 workshop was held on October 28-29, 2019, in Austin, Texas. The\nmain objectives of this workshop were to (1) understand the impact of the CSSI\nprogram on the community over the last 9 years, (2) engage workshop\nparticipant…