Explanipedia

Probabilistic Token Alignment for Large Language Model Fusion Open

Rui Zeng, Jia Liang, Cheng Han, Zhiwen Cao, Jiahao Liu , et al. · 2025

Training large language models (LLMs) from scratch can yield models with unique functionalities and strengths, but it is costly and often leads to redundant capabilities. A more cost-effective alternative is to fuse existing pre-trained LL…

VTBench: Evaluating Visual Tokenizers for Autoregressive Image Generation Open

Huawei Lin, Tong Geng, Zhaozhuo Xu, Weijie Zhao · 2025

Autoregressive (AR) models have recently shown strong performance in image generation, where a critical component is the visual tokenizer (VT) that maps continuous pixel inputs to discrete token sequences. The quality of the VT largely def…

ACiS: Complex Processing in the Switch Fabric Open

Pouya Haghi, Anqi Guo, Tong Geng, Anthony Skjellum, Martin Herbordt · 2025

Computer science Physics

For the last three decades a core use of FPGAs has been for processing communication: FPGA-based SmartNICs are in widespread use from the datacenter to IoT. Augmenting switches with FPGAs, however, has been less studied, but has numerous a…

Diff-PIC: Revolutionizing Particle-In-Cell Nuclear Fusion Simulation with Diffusion Models Open

Chuan Liu, Chunshu Wu, Shihui Cao, Mingkai Chen, James Chenhao Liang , et al. · 2024

Physics Computer science Engineering

The rapid development of AI highlights the pressing need for sustainable energy, a critical global challenge for decades. Nuclear fusion, generally seen as an ultimate solution, has been the focus of intensive research for nearly a century…

A systematic evaluation of computational methods for cell segmentation Open

Yuxing Wang, Junhan Zhao, Hongye Xu, Cheng Han, Zhiqiang Tao , et al. · 2024

Computer science Mathematics

Cell segmentation is a fundamental task in analyzing biomedical images. Many computational methods have been developed for cell segmentation and instance segmentation, but their performances are not well understood in various scenarios. We…

Inertial Confinement Fusion Forecasting via Large Language Models Open

Mingkai Chen, Taowen Wang, James Chenhao Liang, Chuan Liu, Chunshu Wu , et al. · 2024

Environmental science Physics Philosophy

Controlled fusion energy is deemed pivotal for the advancement of human civilization. In this study, we introduce $\textbf{LPI-LLM}$, a novel integration of Large Language Models (LLMs) with classical reservoir computing paradigms tailored…

Accelerating Communication in Deep Learning Recommendation Model Training with Dual-Level Adaptive Lossy Compression Open

Hao Feng, Boyuan Zhang, Fanjiang Ye, Min Si, Ching-Hsiang Chu , et al. · 2024

Computer science Geography Materials science

DLRM is a state-of-the-art recommendation system model that has gained widespread adoption across various industry applications. The large size of DLRM models, however, necessitates the use of multiple devices/GPUs for efficient training. …

Prototypical Transformer as Unified Motion Learners Open

Cheng Han, Yawen Lu, Guohao Sun, James C. Liang, Zhiwen Cao , et al. · 2024

Computer science Engineering

In this work, we introduce the Prototypical Transformer (ProtoFormer), a general and unified framework that approaches various motion tasks from a prototype perspective. ProtoFormer seamlessly integrates prototype learning with Transformer…

SmartFuse: Reconfigurable Smart Switches to Accelerate Fused Collectives in HPC Applications Open

Pouya Haghi, Cheng Tan, Anqi Guo, Chunshu Wu, Dongfang Liu , et al. · 2024

Computer science Materials science

Communication switches have sometimes been augmented to process collectives, e.g., in the IBM BlueGene and Mellanox SHArP switches. In this work, we find that there is a great acceleration opportunity through the further augmentation of sw…

Evaluating Emerging AI/ML Accelerators: IPU, RDU, and NVIDIA/AMD GPUs Open

Hongwu Peng, Caiwen Ding, Tong Geng, Sutanay Choudhury, Kevin Barker , et al. · 2024

Computer science Engineering

The relentless advancement of artificial intelligence (AI) and machine learning (ML) applications necessitates the development of specialized hardware accelerators capable of handling the increasing complexity and computational demands. Tr…

FPGA-Accelerated Range-Limited Molecular Dynamics Open

Chunshu Wu, Chen Yang, Sahan Bandara, Tong Geng, Anqi Guo , et al. · 2024

Computer science Materials science

Long timescale Molecular Dynamics (MD) simulation of small molecules is crucial in drug design and basic science. To accelerate a small data set that is executed for a large number of iterations, high-efficiency is required. Recent work in…

Evaluating Emerging AI/ML Accelerators: IPU, RDU, and NVIDIA/AMD GPUs Open

Hongwu Peng, Caiwen Ding, Tong Geng, Sutanay Choudhury, Kevin Barker , et al. · 2023

Computer science Engineering Geography

The relentless advancement of artificial intelligence (AI) and machine learning (ML) applications necessitates the development of specialized hardware accelerators capable of handling the increasing complexity and computational demands. Tr…

SUPPORTING ENERGY-BASED LEARNING WITH AN ISING MACHINE SUBSTRATE: A CASE STUDY ON RBM Open

Uday Kumar Reddy Vengalam, Yongchao Liu, Tong Geng, Hui Wu, Michael Huang · 2023

Computer science Mathematics Physics

Nature apparently does a lot of computation constantly. If we can harness some of that computation at an appropriate level, we can potentially perform certain type of computation (much) faster and more efficiently than we can do with a von…

LinGCN: Structural Linearized Graph Convolutional Network for Homomorphically Encrypted Inference Open

Hongwu Peng, Ran Ran, Yukui Luo, Jiahui Zhao, Shaoyi Huang , et al. · 2023

Computer science Engineering

The growth of Graph Convolution Network (GCN) model sizes has revolutionized numerous applications, surpassing human performance in areas such as personal healthcare and financial systems. The deployment of GCNs in the cloud raises privacy…

ClusterFormer: Clustering As A Universal Visual Learner Open

James C. Liang, Yiming Cui, Qifan Wang, Tong Geng, Wenguan Wang , et al. · 2023

Computer science

This paper presents CLUSTERFORMER, a universal vision model that is based on the CLUSTERing paradigm with TransFORMER. It comprises two novel designs: 1. recurrent cross-attention clustering, which reformulates the cross-attention mechanis…

Accel-GCN: High-Performance GPU Accelerator Design for Graph Convolution Networks Open

Xie Xi, Hongwu Peng, Amit Hasan, Shaoyi Huang, Jiahui Zhao , et al. · 2023

Computer science Engineering

Graph Convolutional Networks (GCNs) are pivotal in extracting latent information from graph data across various domains, yet their acceleration on mainstream GPUs is challenged by workload imbalance and memory access irregularity. To addre…

Dissecting Tensor Cores via Microbenchmarks: Latency, Throughput and Numeric Behaviors Open

Wei Sun, Ang Li, Tong Geng, Sander Stuijk, Henk Corporaal · 2022

Computer science

Tensor Cores have been an important unit to accelerate Fused Matrix Multiplication Accumulation (MMA) in all NVIDIA GPUs since Volta Architecture. To program Tensor Cores, users have to use either legacy wmma APIs or current mma APIs. Lega…

A length adaptive algorithm-hardware co-design of transformer on FPGA through sparse attention and dynamic pipelining Open

Hongwu Peng, Shaoyi Huang, Shiyang Chen, Bingbing Li, Tong Geng , et al. · 2022

Computer science

Transformers are considered one of the most important deep learning models since 2018, in part because it establishes state-of-the-art (SOTA) records and could potentially replace existing Deep Neural Networks (DNNs). Despite the remarkabl…

CEAZ Open

Chengming Zhang, Sian Jin, Tong Geng, Jiannan Tian, Ang Li , et al. · 2022

Computer science Engineering

As HPC systems continue to grow to exascale, the amount of data that needs to be saved or transmitted is exploding. To this end, many previous works have studied using error-bounded lossy compressors to reduce the data size and improve the…

APNN-TC Open

Boyuan Feng, Yuke Wang, Tong Geng, Ang Li, Yufei Ding · 2021

Computer science Mathematics Economics

Over the years, accelerating neural networks with quantization has been widely studied. Unfortunately, prior efforts with diverse precisions (e.g., 1-bit weights and 2-bit activations) are usually restricted by limited precision support on…

Optimizing FPGA-based Accelerator Design for Large-Scale Molecular Similarity Search Open

Hongwu Peng, Shiyang Chen, Zhepeng Wang, Junhuan Yang, Scott Weitze , et al. · 2021

Computer science Geography

Molecular similarity search has been widely used in drug discovery to identify structurally similar compounds from large molecular databases rapidly. With the increasing size of chemical libraries, there is growing interest in the efficien…

Binary Complex Neural Network Acceleration on FPGA Open

Hongwu Peng, Shanglin Zhou, Scott Weitze, Jiaxin Li, Sahidul Islam , et al. · 2021

Computer science Physics Mathematics

Being able to learn from complex data with phase information is imperative for many signal processing applications. Today' s real-valued deep neural networks (DNNs) have shown efficiency in latent information analysis but fall short when a…

CEAZ: Accelerating Parallel I/O via Hardware-Algorithm Co-Designed Adaptive Lossy Compression Open

Chengming Zhang, Sian Jin, Tong Geng, Jiannan Tian, Ang Li , et al. · 2021

Computer science Engineering

As HPC systems continue to grow to exascale, the amount of data that needs to be saved or transmitted is exploding. To this end, many previous works have studied using error-bounded lossy compressors to reduce the data size and improve the…

CEAZ: Accelerating Parallel I/O via Hardware-Algorithm Co-Design of Efficient and Adaptive Lossy Compression. Open

Chengming Zhang, Sian Jin, Tong Geng, Jiannan Tian, Ang Li , et al. · 2021

Computer science Engineering

As supercomputers continue to grow to exascale, the amount of data that needs to be saved or transmitted is exploding. To this end, many previous works have studied using error-bounded lossy compressors to reduce the data size and improve …

APNN-TC: Accelerating Arbitrary Precision Neural Networks on Ampere GPU Tensor Cores Open

Boyuan Feng, Yuke Wang, Tong Geng, Ang Li, Yufei Ding · 2021

Computer science Mathematics Economics

Over the years, accelerating neural networks with quantization has been widely studied. Unfortunately, prior efforts with diverse precisions (e.g., 1-bit weights and 2-bit activations) are usually restricted by limited precision support on…

ARENA: Asynchronous Reconfigurable Accelerator Ring to Enable Data-Centric Parallel Computing Open

Cheng Tan, Chenhao Xie, Tong Geng, Andrés Márquez, Antonino Tumeo , et al. · 2021

Computer science Economics Engineering

The next generation HPC and data centers are likely to be reconfigurable and data-centric due to the trend of hardware specialization and the emergence of data-driven applications. In this work, we propose ARENA – an asynchronous reconfigu…

BCNN: Binary Complex Neural Network Open

Yanfei Li, Tong Geng, Ang Li, Huimin Yu · 2021

Computer science Mathematics Political science

Binarized neural networks, or BNNs, show great promise in edge-side applications with resource limited hardware, but raise the concerns of reduced accuracy. Motivated by the complex neural networks, in this paper we introduce complex repre…

Tong Geng YOU? Author Swipe