Aamir Shafi
YOU?
Author Swipe
View article: Scaling Large Language Model Training on Frontier with Low-Bandwidth Partitioning
Scaling Large Language Model Training on Frontier with Low-Bandwidth Partitioning Open
Scaling up Large Language Model(LLM) training involves fitting a tremendous amount of training parameters across a limited number of workers. However, methods like ZeRO-3 that drastically reduce GPU memory pressure often incur heavy commun…
View article: Accelerating Large Language Model Training with Hybrid GPU-based Compression
Accelerating Large Language Model Training with Hybrid GPU-based Compression Open
Data Parallelism (DP), Tensor Parallelism (TP), and Pipeline Parallelism (PP) are the three strategies widely adopted to enable fast and efficient Large Language Model (LLM) training. However, these approaches rely on data-intensive commun…
View article: Training Ultra Long Context Language Model with Fully Pipelined Distributed Transformer
Training Ultra Long Context Language Model with Fully Pipelined Distributed Transformer Open
Large Language Models (LLMs) with long context capabilities are integral to complex tasks in natural language processing and computational biology, such as text generation and protein sequence analysis. However, training LLMs directly on e…
View article: Demystifying the Communication Characteristics for Distributed Transformer Models
Demystifying the Communication Characteristics for Distributed Transformer Models Open
Deep learning (DL) models based on the transformer architecture have revolutionized many DL applications such as large language models (LLMs), vision transformers, audio generation, and time series prediction. Much of this progress has bee…
View article: Phytocompounds as Promising Weapons against Lung Cancer: A Review
Phytocompounds as Promising Weapons against Lung Cancer: A Review Open
Lung cancer is the second most prevalent form of cancer in both men and women, which incurs major economic and public health losses. Notably, easy access to tobacco is the most important cause of pulmonary cancer, with 80%–90% of cases com…
View article: Accelerating MPI AllReduce Communication with Efficient GPU-Based Compression Schemes on Modern GPU Clusters
Accelerating MPI AllReduce Communication with Efficient GPU-Based Compression Schemes on Modern GPU Clusters Open
With the increasing scale of High-Performance Computing (HPC) and Deep Learning (DL) applications through GPU adaptation, the seamless communication of data stored on GPUs has become a critical factor in enhancing overall application perfo…
View article: The Case for Co-Designing Model Architectures with Hardware
The Case for Co-Designing Model Architectures with Hardware Open
While GPUs are responsible for training the vast majority of state-of-the-art deep learning models, the implications of their architecture are often overlooked when designing new deep learning (DL) models. As a consequence, modifying a DL …
View article: Exploiting Inter-Layer Expert Affinity for Accelerating Mixture-of-Experts Model Inference
Exploiting Inter-Layer Expert Affinity for Accelerating Mixture-of-Experts Model Inference Open
In large language models like the Generative Pre-trained Transformer, the Mixture of Experts paradigm has emerged as a powerful technique for enhancing model expressiveness and accuracy. However, deploying GPT MoE models for parallel infer…
View article: Tutorials
Tutorials Open
Recent advances in Machine and Deep Learning (ML/DL) have led to many exciting challenges and opportunities.Modern ML/DL and Data Science frameworks including TensorFlow, PyTorch, and Dask have emerged that offer high-performance training …
View article: Flover: A Temporal Fusion Framework for Efficient Autoregressive Model Parallel Inference
Flover: A Temporal Fusion Framework for Efficient Autoregressive Model Parallel Inference Open
Autoregressive models, despite their commendable performance in a myriad of generative tasks, face challenges stemming from their inherently sequential structure. Inference on these models, by design, harnesses a temporal dependency, where…
View article: Feature Selection based Breast Cancer Prediction
Feature Selection based Breast Cancer Prediction Open
Breast cancer is one of the main causes of mortality for women around the world.Such mortality rate could be reduced if it is possible to diagnose breast cancer at the primary stage.It is hard to determine the causes of this disease that m…
View article: MCR-DL: Mix-and-Match Communication Runtime for Deep Learning
MCR-DL: Mix-and-Match Communication Runtime for Deep Learning Open
In recent years, the training requirements of many state-of-the-art Deep Learning (DL) models have scaled beyond the compute and memory capabilities of a single processor, and necessitated distribution among processors. Training such massi…
View article: Performance Characterization of using Quantization for DNN Inference on Edge Devices: Extended Version
Performance Characterization of using Quantization for DNN Inference on Edge Devices: Extended Version Open
Quantization is a popular technique used in Deep Neural Networks (DNN) inference to reduce the size of models and improve the overall numerical performance by exploiting native hardware. This paper attempts to conduct an elaborate performa…
View article: Lightning Talks of EduHPC 2022
Lightning Talks of EduHPC 2022 Open
The lightning talks at EduHPC provide an opportunity to share early results and insights on parallel and distributed computing (PDC) education and training efforts. The four lightning talks at EduHPC 2022 cover a range of topics in broaden…
View article: Ileo-Ileal Intussusception Presenting as Cullen’s Sign: A Case Report
Ileo-Ileal Intussusception Presenting as Cullen’s Sign: A Case Report Open
Intussusception is defined as the telescoping of one segment of intestines into another causing intestinal obstruction. This condition although common in children is considered a rare condition in adults and is usually present secondary to…
View article: Subarachnoid Haemorrhage: A Systematic Review
Subarachnoid Haemorrhage: A Systematic Review Open
Subarachnoid haemorrhage arises from the accumulation of blood between the arachnoid and pia mater resulting from an aneurysmal rupture or traumatic head injury. Subarachnoid haemorrhage is a life-threatening emergency that requires prompt…
View article: Fat Embolism Syndrome Complicated by Acute Pulmonary Thromboembolism after Bilateral Femoral Shaft Fractures: Two Nightmares in The Same Patient
Fat Embolism Syndrome Complicated by Acute Pulmonary Thromboembolism after Bilateral Femoral Shaft Fractures: Two Nightmares in The Same Patient Open
Fat embolism syndrome (FES) is an uncommon but fatal complication usually in orthopedic trauma especially after a long bone fracture. A high level of suspicion should be kept in mind when a patient of long bone fracture develops hypoxia, c…
View article: Monkey Pox: What We Need to Know
Monkey Pox: What We Need to Know Open
Monkey pox (MP) is a zoonotic orthopox viral infectious disease clinically resembling Small pox but with lesser mortality.First discovered in 1958 in monkeys, the first human infection was documented in 1970.Recently cases have been seen r…
View article: Buried Penis - A Hidden Problem in Obese Children
Buried Penis - A Hidden Problem in Obese Children Open
Buried penis is a condition that can affect boys and adult men.In this condition, the penis is of normal size but is hidden under the skin of the abdomen, thigh, or scrotum.We report a case of a 12 year old obese boy who was brought by his…
View article: OMB-Py: Python Micro-Benchmarks for Evaluating Performance of MPI Libraries on HPC Systems
OMB-Py: Python Micro-Benchmarks for Evaluating Performance of MPI Libraries on HPC Systems Open
Python has become a dominant programming language for emerging areas like Machine Learning (ML), Deep Learning (DL), and Data Science (DS). An attractive feature of Python is that it provides easy-to-use programming interface while allowin…
View article: INAM: Cross-stack Profiling and Analysis of Communication in MPI-based Applications
INAM: Cross-stack Profiling and Analysis of Communication in MPI-based Applications Open
Understanding the full-stack performance trade-offs and interplay among HPC applications, MPI libraries, the communication fabric, and the job scheduler is a challenging endeavor. Unfortunately, existing profiling tools are disjoint and on…
View article: Efficient MPI-based Communication for GPU-Accelerated Dask Applications
Efficient MPI-based Communication for GPU-Accelerated Dask Applications Open
Dask is a popular parallel and distributed computing framework, which rivals Apache Spark to enable task-based scalable processing of big data. The Dask Distributed library forms the basis of this computing engine and provides support for …
View article: 27th IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC 2020) Technical program
27th IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC 2020) Technical program Open
HiPC 2020 is the 27th edition of the IEEE International Conference on High Performance Computing, Data, and Analytics.The conference focus is not only HPC but also includes Data Science.Due to the COVID-19 pandemic, this year the conferenc…
View article: Outcome-based (Engineering) Education (OBE): International Accreditation Practices
Outcome-based (Engineering) Education (OBE): International Accreditation Practices Open
Outcome-based education (OBE) is a paradigm in which instructional and assessment/ evaluation are explicitly designed for ensuring the attainment and mastery of predefined learning outcomes. OBE is now the underlying paradigm followed by g…
View article: Clinical Interpretation of Detection of IgM Anti-<i>Brucella</i> Antibody in the Absence of IgG and <i>Vice Versa</i>; a Diagnostic Challenge for Clinicians
Clinical Interpretation of Detection of IgM Anti-<i>Brucella</i> Antibody in the Absence of IgG and <i>Vice Versa</i>; a Diagnostic Challenge for Clinicians Open
Non-specific and often misleading clinical presentation of active brucellosis has made it a diagnostic puzzle for treating physicians. Clinicians rely greatly on the detection of IgG and IgM anti- Brucella antibodies by ELISA. Different pa…
View article: Student Outcomes Assessment Methodology for ABET Accreditation: A Case Study of Computer Science and Computer Information Systems Programs
Student Outcomes Assessment Methodology for ABET Accreditation: A Case Study of Computer Science and Computer Information Systems Programs Open
Acquiring academic accreditation for degree programs is a top priority for universities across the world. This is understandable because accreditation not only leads to better content and delivery of these programs but also allows these in…
View article: Performance Comparison of a Parallel Recommender Algorithm Across Three Hadoop-Based Frameworks
Performance Comparison of a Parallel Recommender Algorithm Across Three Hadoop-Based Frameworks Open
One of the challenges our society faces is the ever increasing amount of data. Among existing platforms that address the system requirements, Hadoop is a framework widely used to store and analyze "big data". On the human side, one of the …
View article: Additional file 2 of Parameter estimation of qualitative biological regulatory networks on high performance computing hardware
Additional file 2 of Parameter estimation of qualitative biological regulatory networks on high performance computing hardware Open
SMBIONET FILE 2. The SMBioNet file contains source code of qualitative model of Fibroblast Growth Factor (FGF) Signalling in Drosophila melanogaster. (ZIP 1 kb)