Explanipedia

A Closeness Centrality-based Circuit Partitioner for Quantum Simulations Open

Doru Thom Popovici, Harlin Lee, Mauro Del Ben, Nobuyasu Ito, Katherine Klymko , et al. · 2025

Simulating quantum circuits (QC) on high-performance computing (HPC) systems has become an essential method to benchmark algorithms and probe the potential of large-scale quantum computation despite the limitations of current quantum hardw…

Towards An Approach to Identify Divergences in Hardware Designs for HPC Workloads Open

Doru Thom Popovici, Vega Mario, Άγγελος Ιωάννου, Fabien Chaix, Dania Mosuli , et al. · 2025

Developing efficient hardware accelerators for mathematical kernels used in scientific applications and machine learning has traditionally been a labor-intensive task. These accelerators typically require low-level programming in Verilog o…

Flexible Multi-Dimensional FFTs for Plane Wave Density Functional Theory Codes Open

Doru Thom Popovici, Mauro Del Ben, Osni Marques, Andrew Canning · 2024

Multi-dimensional Fourier transforms are key mathematical building blocks that appear in a wide range of applications from materials science, physics, chemistry and even machine learning. Over the past years, a multitude of software packag…

Toward Practical Superconducting Accelerators for Machine Learning Using U-SFQ Open

Patricia Gonzalez-Guerrero, Kylie Huch, Nirmalendu Patra, Doru Thom Popovici, George Michelogiannakis · 2024

Computer science Physics

Most popular superconducting circuits operate on information carried by ps-wide, μV-tall, single flux quantum (SFQ) pulses. These circuits can operate at frequencies of hundreds of GHz with orders of magnitude lower switching energy than c…

Distributed memory, GPU accelerated Fock construction for hybrid, Gaussian basis density functional theory Open

David B. Williams‐Young, Andrey Asadchev, Doru Thom Popovici, David J. Clark, Jonathan M. Waldrop , et al. · 2023

Computer science Physics Mathematics

With the growing reliance of modern supercomputers on accelerator-based architecture such a graphics processing units (GPUs), the development and optimization of electronic structure methods to exploit these massively parallel resources ha…

SlimFit: Memory-Efficient Fine-Tuning of Transformer-based Models Using Training Dynamics Open

Arash Ardakani, Altan Haan, Shangyin Tan, Doru Thom Popovici, Alvin Cheung , et al. · 2023

Computer science Physics

Transformer-based models, such as BERT and ViT, have achieved state-of-the-art results across different natural language processing (NLP) and computer vision (CV) tasks. However, these models are extremely memory intensive during their fin…

Distributed Memory, GPU Accelerated Fock Construction for Hybrid, Gaussian Basis Density Functional Theory Open

David B. Williams‐Young, Andrey Asadchev, Doru Thom Popovici, David Clark, Johnathan Waldrop , et al. · 2023

Computer science Physics Mathematics

With the growing reliance of modern supercomputers on accelerator-based architectures such a GPUs, the development and optimization of electronic structure methods to exploit these massively parallel resources has become a recent priority.…

A systematic approach to improving data locality across Fourier transforms and linear algebra operations Open

Doru Thom Popovici, Andrew Canning, Zhengji Zhao, Lin‐Wang Wang, John Shalf · 2021

Computer science Mathematics Philosophy

The performance of most scientific applications depends on efficient mathematical libraries. For example, scientific applications like the plane wave based Density Functional Theory approach for electronic structure calculations uses highl…

A High-Throughput Solver for Marginalized Graph Kernels on GPU Open

Yuhang Tang, Oğuz Selvitopi, Doru Thom Popovici, Aydın Buluç · 2020

Computer science Physics

We present the design and optimization of a linear solver on General Purpose\nGPUs for the efficient and high-throughput evaluation of the marginalized graph\nkernel between pairs of labeled graphs. The solver implements a preconditioned\n…

SPIRAL: Extreme Performance Portability Open

Franz Franchetti, Tze Meng Low, Doru Thom Popovici, Richard Veras, Daniele G. Spampinato , et al. · 2018

Computer science Mathematics

In this paper, we address the question of how to automatically map computational kernels to highly efficient code for a wide range of computing platforms and establish the correctness of the synthesized code. More specifically, we focus on…

Compilers, hands-off my hands-on optimizations Open

Richard Veras, Doru Thom Popovici, Tze Meng Low, Franz Franchetti · 2016

Computer science Philosophy Economics

Achieving high performance for compute bounded numerical kernels typically requires an expert to hand select an appropriate set of Single-instruction multiple-data (SIMD) instructions, then statically scheduling them in order to hide their…

Doru Thom Popovici YOU? Author Swipe