Explanipedia

ACT: Automatically Generating Compiler Backends from Tensor Accelerator ISA Descriptions Open

Devansh Jain, Akash Pardeshi, Matteo Frigo, K. S. Vasundara Patel, Kaustubh Khulbe , et al. · 2025

Tensor compilers play a key role in enabling high-performance implementations of deep learning workloads. These compilers rely on existing CPU and GPU code generation backends to generate device-specific code. Recently, many tensor acceler…

GALA: A High Performance Graph Neural Network Acceleration LAnguage and Compiler Open

Damitha Lenadora, Nikhil Jayakumar, Chamika Sudusinghe, Charith Mendis · 2025

Multiple frameworks and optimizations have been proposed for accelerating Graph Neural Network (GNN) workloads over the years, achieving sizable runtime performance improvements. However, we notice that existing systems usually explore opt…

PilotDB: Database-Agnostic Online Approximate Query Processing with A Priori Error Guarantees Open

Yuxuan Zhu, Tengjun Jin, Stefanos Baziotis, Chengsong Zhang, Charith Mendis , et al. · 2025

After decades of research in approximate query processing (AQP), its adoption in the industry remains limited. Existing methods struggle to simultaneously provide user-specified error guarantees, eliminate maintenance overheads, and avoid …

MISAAL: Synthesis-Based Automatic Generation of Efficient and Retargetable Semantics-Driven Optimizations Open

Abdul Rafae Noor, Dhruv Baronia, Anshul Kothari, Muchen Xu, Charith Mendis , et al. · 2025

Computer science

Using program synthesis to select instructions for and optimize input programs is receiving increasing attention. However, existing synthesis-based compilers are faced by two major challenges that prohibit the deployment of program synthes…

PandasBench: A Benchmark for the Pandas API Open

Alex Broihier, Stefanos Baziotis, Daniel G. Kang, Charith Mendis · 2025

The Pandas API has been central to the success of pandas and its alternatives. Despite its importance, there is no benchmark for it, and we argue that we cannot repurpose existing benchmarks (from other domains) for the Pandas API. In this…

COGNATE: Acceleration of Sparse Tensor Programs on Emerging Hardware using Transfer Learning Open

Chamika Sudusinghe, Gerasimos Gerogiannis, Damitha Lenadora, Charles Block, Josep Torrellas , et al. · 2025

Sparse tensor programs are essential in deep learning and graph analytics, driving the need for optimized processing. To meet this demand, specialized hardware accelerators are being developed. Optimizing these programs for accelerators is…

Automated Verification of Soundness of DNN Certifiers Open

Avaljot Singh, Yasmin Sarita, Charith Mendis, Gagandeep Singh · 2025

Computer science

The uninterpretability of Deep Neural Networks (DNNs) hinders their use in safety-critical applications. Abstract Interpretation-based DNN certifiers provide promising avenues for building trust in DNNs. Unsoundness in the mathematical log…

SPLAT: A Framework for Optimised GPU Code-Generation for SParse reguLar ATtention Open

Ahan Gupta, Yueming Yuan, Devansh Jain, Yunxiu Ge, David Aponte , et al. · 2025

Computer science

Multi-head-self-attention (MHSA) mechanisms achieve state-of-the-art (SOTA) performance across natural language processing and vision tasks. However, their quadratic dependence on sequence lengths has bottlenecked inference speeds. To circ…

TensorRight: Automated Verification of Tensor Graph Rewrites Open

Jai Arora, Sirui Lu, Devansh Jain, Tianfan Xu, Farzin Houshmand , et al. · 2025

Computer science Biology

Tensor compilers, essential for generating efficient code for deep learning models across various applications, employ tensor graph rewrites as one of the key optimizations. These rewrites optimize tensor computational graphs with the expe…

Transforming the Hybrid Cloud for Emerging AI Workloads Open

Deming Chen, Alaa Youssef, Ravi Pendse, André Schleife, Bryan K. Clark , et al. · 2024

Computer science Business

This white paper, developed through close collaboration between IBM Research and UIUC researchers within the IIDAI Institute, envisions transforming hybrid cloud systems to meet the growing complexity of AI workloads through innovative, fu…

Hydride: A Retargetable and Extensible Synthesis-based Compiler for Modern Hardware Architectures Open

Akash Kothari, Abdul Rafae Noor, Muchen Xu, Hassam Uddin, Dhruv Baronia , et al. · 2024

Computer science

As modern hardware architectures evolve to support increasingly diverse, complex instruction sets for meeting the performance demands of modern workloads in image processing, deep learning, etc., it has become ever more crucial for compile…

TGLite: A Lightweight Programming Framework for Continuous-Time Temporal Graph Neural Networks Open

Yufeng Wang, Charith Mendis · 2024

Computer science

In recent years, Temporal Graph Neural Networks (TGNNs) have achieved great success in learning tasks for graphs that change over time. These dynamic/temporal graphs represent topology changes as either discrete static graph snapshots (cal…

ConstraintFlow: A DSL for Specification and Verification of Neural Network Analyses Open

Avaljot Singh, Yasmin Sarita, Charith Mendis, Gagandeep Singh · 2024

Computer science

We develop a declarative DSL - \cf - that can be used to specify Abstract Interpretation-based DNN certifiers. In \cf, programmers can easily define various existing and new abstract domains and transformers, all within just a few 10s of L…

Dias: Dynamic Rewriting of Pandas Code Open

Stefanos Baziotis, Daniel Kang, Charith Mendis · 2024

Computer science

In recent years, dataframe libraries, such as pandas have exploded in popularity. Due to their flexibility, they are increasingly used in ad-hoc exploratory data analysis (EDA) workloads. These workloads are diverse, including custom funct…

FLuRKA: Fast and accurate unified Low-Rank & Kernel Attention Open

Ahan Gupta, Yueming Yuan, Yanqi Zhou, Charith Mendis, Mendis, Charith · 2023

Computer science Mathematics Physics

Many efficient $\textit{approximate}$ self-attention techniques have become prevalent since the inception of the transformer architecture. Two popular classes of these techniques are low-rank and kernel methods. Each of these methods has i…

SENSEi: Input-Sensitive Compilation for Accelerating GNNs Open

Damitha Lenadora, Vimarsh Sathia, Gerasimos Gerogiannis, Şerif Yeşil, Josep Torrellas , et al. · 2023

Computer science Biology Physics

Over the years, many frameworks and optimization techniques have been proposed to accelerate graph neural networks (GNNs). Compared to the optimizations explored in these systems, we observe that different matrix re-associations of GNN com…

Learning Large Graph Property Prediction via Graph Segment Training Open

Kaidi Cao, Phitchaya Mangpo Phothilimthana, Sami Abu-El-Haija, Dustin Zelle, Yanqi Zhou , et al. · 2023

Computer science Philosophy

Learning to predict properties of large graphs is challenging because each prediction requires the knowledge of an entire graph, while the amount of memory available during training is bounded. Here we propose Graph Segment Training (GST),…

Dias: Dynamic Rewriting of Pandas Code Open

Stefanos Baziotis, Daniel Kang, Charith Mendis · 2023

Computer science

In recent years, dataframe libraries, such as pandas have exploded in popularity. Due to their flexibility, they are increasingly used in ad-hoc exploratory data analysis (EDA) workloads. These workloads are diverse, including custom funct…

COMET: Neural Cost Model Explanation Framework Open

Isha Chaudhary, Alex Renda, Charith Mendis, Gagandeep Singh · 2023

Computer science Physics

Cost models predict the cost of executing given assembly code basic blocks on a specific microarchitecture. Recently, neural cost models have been shown to be fairly accurate and easy to construct. They can replace heavily engineered analy…

WACO: Learning Workload-Aware Co-optimization of the Format and Schedule of a Sparse Tensor Program Open

Jaeyeon Won, Charith Mendis, Joel Emer, Saman Amarasinghe · 2023

Computer science Mathematics

Leveraging the existence of the large number of zeros in sparse tensors offer a powerful way to solve complex problems efficiently in many applications. However, optimizing the performance of those applications poses a challenge. Sparse te…

GRANITE: A Graph Neural Network Model for Basic Block Throughput Estimation Open

Ondřej Sýkora, Phitchaya Mangpo Phothilimthana, Charith Mendis, Amir Yazdanbakhsh · 2022

Computer science Economics Mathematics

Analytical hardware performance models yield swift estimation of desired hardware performance metrics. However, developing these analytical models for modern processors with sophisticated microarchitectures is an extremely laborious task a…

All you need is superword-level parallelism: systematic control-flow vectorization with SLP Open

Yishen Chen, Charith Mendis, Saman Amarasinghe · 2022

Computer science

Superword-level parallelism (SLP) vectorization is a proven technique for vectorizing straight-line code. It works by replacing independent, isomorphic instructions with equivalent vector instructions. Larsen and Amarasinghe originally pro…

VeGen: a vectorizer generator for SIMD and beyond Open

Yishen Chen, Charith Mendis, Michael Carbin, Saman Amarasinghe · 2021

Computer science Physics

Vector instructions are ubiquitous in modern processors. Traditional compiler auto-vectorization techniques have focused on targeting single instruction multiple data (SIMD) instructions. However, these auto-vectorization techniques are no…

A Learned Performance Model for Tensor Processing Units Open

Sam Kaufman, Phitchaya Mangpo Phothilimthana, Yanqi Zhou, Charith Mendis, Sudip Roy , et al. · 2021

Computer science Mathematics

Accurate hardware performance models are critical to efficient code generation. They can be used by compilers to make heuristic decisions, by superoptimizers as a minimization objective, or by autotuners to find an optimal configuration fo…

DiffTune: Optimizing CPU Simulator Parameters with Learned Differentiable Surrogates Open

Alex Renda, Yishen Chen, Charith Mendis, Michael Carbin · 2020

Computer science Mathematics Biology

CPU simulators are useful tools for modeling CPU execution behavior. However, they suffer from inaccuracies due to the cost and complexity of setting their fine-grained parameters, such as the latencies of individual instructions. This com…

A Learned Performance Model for Tensor Processing Units Open

Samuel J. Kaufman, Phitchaya Mangpo Phothilimthana, Yanqi Zhou, Charith Mendis, Sudip Roy , et al. · 2020

Computer science Mathematics

Accurate hardware performance models are critical to efficient code generation. They can be used by compilers to make heuristic decisions, by superoptimizers as a minimization objective, or by autotuners to find an optimal configuration fo…

Ithemal: Accurate, Portable and Fast Basic Block Throughput Estimation using Deep Neural Networks Open

Charith Mendis, Alex Renda, Saman Amarasinghe, Michael Carbin · 2018

Computer science Mathematics

Predicting the number of clock cycles a processor takes to execute a block of assembly instructions in steady state (the throughput) is important for both compiler designers and performance engineers. Building an analytical model to do so …

Making Caches Work for Graph Analytics Open

Yunming Zhang, Vladimir Kiriansky, Charith Mendis, Matei Zaharia, Saman Amarasinghe · 2016

Computer science Mathematics

Modern hardware systems are heavily underutilized when running large-scale graph applications. While many in-memory graph frameworks have made substantial progress in optimizing these applications, we show that it is still possible to achi…

Charith Mendis YOU? Author Swipe