Explanipedia

ParallelBench: Understanding the Trade-offs of Parallel Decoding in Diffusion LLMs Open

Won Jun Kang, Kevin Galim, S.K. Oh, Minjae Lee, Yuchen Zeng , et al. · 2025

While most autoregressive LLMs are constrained to one-by-one decoding, diffusion LLMs (dLLMs) have attracted growing interest for their potential to dramatically accelerate inference through parallel decoding. Despite this promise, the con…

LLM-Lasso: A Robust Framework for Domain-Informed Feature Selection and Regularization Open

Erica Zhang, Rei Goto, Naomi Sagan, Jurik Mutter, Nick Phillips , et al. · 2025

We introduce LLM-Lasso, a novel framework that leverages large language models (LLMs) to guide feature selection in Lasso $\ell_1$ regression. Unlike traditional methods that rely solely on numerical data, LLM-Lasso incorporates domain-spe…

VersaPRM: Multi-Domain Process Reward Model via Synthetic Reasoning Data Open

Thomas Zeng, Shuibai Zhang, Shutong Wu, Christoph Classen, Daewon Chae , et al. · 2025

Process Reward Models (PRMs) have proven effective at enhancing mathematical reasoning for Large Language Models (LLMs) by leveraging increased inference-time computation. However, they are predominantly trained on mathematical data and th…

Self-Improving Transformers Overcome Easy-to-Hard and Length Generalization Challenges Open

Na-Young Lee, Zixin Cai, Avi Schwarzschild, Kangwook Lee, Dimitris Papailiopoulos · 2025

Large language models often struggle with length generalization and solving complex problem instances beyond their training distribution. We present a self-improvement approach where models iteratively generate and learn from their own sol…

Task Vectors in In-Context Learning: Emergence, Formation, and Benefit Open

Yang Liu, Ziqian Lin, Kangwook Lee, Dimitris Papailiopoulos, Robert Nowak · 2025

In-context learning is a remarkable capability of transformers, referring to their ability to adapt to specific tasks based on a short history or context. Previous research has found that task-specific information is locally encoded within…

Multi-Bin Batching for Increasing LLM Inference Throughput Open

Ozgur Guldogan, Jackson Kunde, Kangwook Lee, Ramtin Pedarsani · 2024

As large language models (LLMs) grow in popularity for their diverse capabilities, improving the efficiency of their inference systems has become increasingly critical. Batching LLM requests is a critical step in scheduling the inference j…

Forward and Inverse Simulation of Pseudo-Two-Dimensional Model of Lithium-Ion Batteries Using Neural Networks Open

Myeong-Su Lee, Jae‐Min Oh, Dong-Chan Lee, Kangwook Lee, Soon-Cheol Park , et al. · 2024

In this work, we address the challenges posed by the high nonlinearity of the Butler-Volmer (BV) equation in forward and inverse simulations of the pseudo-two-dimensional (P2D) model using the physics-informed neural network (PINN) framewo…

Rare-to-Frequent: Unlocking Compositional Generation Power of Diffusion Models on Rare Concepts with LLM Guidance Open

Dongmin Park, Sebin Kim, Taehong Moon, Min‐Kyu Kim, Kangwook Lee , et al. · 2024

State-of-the-art text-to-image (T2I) diffusion models often struggle to generate rare compositions of concepts, e.g., objects with unusual attributes. In this paper, we show that the compositional generation power of diffusion models on su…

Parameter-Efficient Fine-Tuning of State Space Models Open

Kevin Galim, Won Jun Kang, Yuchen Zeng, Hyung Il Koo, Kangwook Lee · 2024

Deep State Space Models (SSMs), such as Mamba (Gu & Dao, 2024), have become powerful tools for language modeling, offering high performance and linear scalability with sequence length. However, the application of parameter-efficient fine-t…

Everything Everywhere All at Once: LLMs can In-Context Learn Multiple Tasks in Superposition Open

Zheyang Xiong, Zixin Cai, John Cooper, Albert Ge, Vasilis Papageorgiou , et al. · 2024

Large Language Models (LLMs) have demonstrated remarkable in-context learning (ICL) capabilities. In this study, we explore a surprising phenomenon related to ICL: LLMs can perform multiple, computationally distinct ICL tasks simultaneousl…

ENTP: Encoder-only Next Token Prediction Open

Ethan Ewer, Daewon Chae, Thomas Zeng, Jinkyu Kim, Kangwook Lee · 2024

Next-token prediction is conventionally done using decoder-only Transformers with causal attention, as this approach allows for efficient reuse of keys and values. What if we were not compute-limited, should we still use decoder-only Trans…

Looped Transformers for Length Generalization Open

Ying Fan, Yilun Du, Kannan Ramchandran, Kangwook Lee · 2024

Recent work has shown that Transformers trained from scratch can successfully solve various arithmetic and algorithmic tasks, such as adding numbers and computing parity. While these Transformers generalize well on unseen inputs of the sam…

Buffer-based Gradient Projection for Continual Federated Learning Open

Shenghong Dai, Jy-yong Sohn, Yicong Chen, S M Iftekharul Alam, Ravikumar Balakrishnan , et al. · 2024

Continual Federated Learning (CFL) is essential for enabling real-world applications where multiple decentralized clients adaptively learn from continuous data streams. A significant challenge in CFL is mitigating catastrophic forgetting, …

From Artificial Needles to Real Haystacks: Improving Retrieval Capabilities in LLMs by Finetuning on Synthetic Data Open

Zheyang Xiong, Vasilis Papageorgiou, Kangwook Lee, Dimitris Papailiopoulos · 2024

Recent studies have shown that Large Language Models (LLMs) struggle to accurately retrieve information and maintain reasoning capabilities when processing long-context inputs. To address these limitations, we propose a finetuning approach…

Dual Operating Modes of In-Context Learning Open

Ziqian Lin, Kangwook Lee · 2024

In-context learning (ICL) exhibits dual operating modes: task learning, i.e., acquiring a new skill from in-context samples, and task retrieval, i.e., locating and activating a relevant pretrained skill. Recent theoretical work investigate…

Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks Open

Jong-Ho Park, Jaeseung Park, Zheyang Xiong, Na-Young Lee, Jaewoong Cho , et al. · 2024

State-space models (SSMs), such as Mamba (Gu & Dao, 2023), have been proposed as alternatives to Transformer networks in language modeling, by incorporating gating, convolutions, and input-dependent token selection to mitigate the quadrati…

Can MLLMs Perform Text-to-Image In-Context Learning? Open

Yuchen Zeng, Won Jun Kang, Yicong Chen, Hyung Il Koo, Kangwook Lee · 2024

The evolution from Large Language Models (LLMs) to Multimodal Large Language Models (MLLMs) has spurred research into extending In-Context Learning (ICL) to its multimodal counterpart. Existing such studies have primarily concentrated on i…

Forward and Inverse Simulation of Pseudo-Two-Dimensional Model of Lithium-Ion Batteries Using Neural Networks Open

Myeong-Su Lee, Jae‐Min Oh, Dong-Chan Lee, Kangwook Lee, Soon-Cheol Park , et al. · 2024

Looped Transformers are Better at Learning Learning Algorithms Open

Yang Liu, Kangwook Lee, Robert Nowak, Dimitris Papailiopoulos · 2023

Transformers have demonstrated effectiveness in in-context solving data-fitting problems from various (latent) models, as reported by Garg et al. However, the absence of an inherent iterative structure in the transformer architecture prese…

Super-Resolution Emulation of Large Cosmological Fields with a 3D Conditional Diffusion Model Open

Adam Rouhiainen, Michael Gira, Moritz Münchmeyer, Kangwook Lee, Gary Shiu · 2023

High-resolution (HR) simulations in cosmology, in particular when including baryons, can take millions of CPU hours. On the other hand, low-resolution (LR) dark matter simulations of the same cosmological volume use minimal computing resou…

Image Clustering Conditioned on Text Criteria Open

Sehyun Kwon, Jaeseung Park, Minkyu Kim, Jaewoong Cho, Ernest K. Ryu , et al. · 2023

Classical clustering methods do not provide users with direct control of the clustering results, and the clustering results may not be consistent with the relevant criterion that a user has in mind. In this work, we present a new methodolo…

The Expressive Power of Low-Rank Adaptation Open

Yuchen Zeng, Kangwook Lee · 2023

Low-Rank Adaptation (LoRA), a parameter-efficient fine-tuning method that leverages low-rank adaptation of weight matrices, has emerged as a prevalent technique for fine-tuning pre-trained models such as large language models and diffusion…

Market Structure Analysis of Revenue of International Construction Professional Service (I-CPS): A Country-Level Analysis Open

Kangwook Lee · 2023

International construction professional service (I-CPS) refers to a knowledge-intensive professional service (KIPS), such as architecture, engineering, and consultancy, which uses technology/human capital as its major input and is better p…

Intuitive Access to Smartphone Settings Using Relevance Model Trained by Contrastive Learning Open

Joon-Young Kim, Kangwook Lee, Haebin Shin, Hurnjoo Lee, Sechun Kang , et al. · 2023

The more new features that are being added to smartphones, the harder it becomes for users to find them. This is because the feature names are usually short, and there are just too many to remember. In such a case, the users may want to as…

Predictive Pipelined Decoding: A Compute-Latency Trade-off for Exact LLM Decoding Open

Seongjun Yang, Gibbeum Lee, Jaewoong Cho, Dimitris Papailiopoulos, Kangwook Lee · 2023

This paper presents "Predictive Pipelined Decoding (PPD)," an approach that speeds up greedy decoding in Large Language Models (LLMs) while maintaining the exact same output as the original decoding. Unlike conventional strategies, PPD emp…

Mini-Batch Optimization of Contrastive Loss Open

Jaewoong Cho, Kartik K. Sreenivasan, Keon Lee, Kyunghoo Mun, Soheun Yi , et al. · 2023

Contrastive learning has gained significant attention as a method for self-supervised learning. The contrastive loss function ensures that embeddings of positive sample pairs (e.g., different samples from the same class or different views …

Teaching Arithmetic to Small Transformers Open

Nayoung Lee, Kartik K. Sreenivasan, Jason D. Lee, Kangwook Lee, Dimitris Papailiopoulos · 2023

Large language models like GPT-4 exhibit emergent capabilities across general-purpose tasks, such as basic arithmetic, when trained on extensive text data, even though these tasks are not explicitly encoded by the unsupervised, next-token …

Intuitive Access to Smartphone Settings Using Relevance Model Trained by Contrastive Learning Open

Joon-Young Kim, Kangwook Lee, Haebin Shin, Hurnjoo Lee, Sechun Kang , et al. · 2023

The more new features that are being added to smartphones, the harder it becomes for users to find them. This is because the feature names are usually short and there are just too many of them for the users to remember the exact words. The…

Variation Spaces for Multi-Output Neural Networks: Insights on Multi-Task Learning and Network Compression Open

Joseph Shenouda, Rahul Parhi, Kangwook Lee, Robert D. Nowak · 2023

This paper introduces a novel theoretical framework for the analysis of vector-valued neural networks through the development of vector-valued variation spaces, a new class of reproducing kernel Banach spaces. These spaces emerge from stud…

DPOK: Reinforcement Learning for Fine-tuning Text-to-Image Diffusion Models Open

Ying Fan, Olivia Watkins, Yuqing Du, Hao Liu, Moonkyung Ryu , et al. · 2023

Learning from human feedback has been shown to improve text-to-image models. These techniques first learn a reward function that captures what humans care about in the task and then improve the models based on the learned reward function. …

Kangwook Lee YOU? Author Swipe