Kangwook Lee
YOU?
Author Swipe
View article: ParallelBench: Understanding the Trade-offs of Parallel Decoding in Diffusion LLMs
ParallelBench: Understanding the Trade-offs of Parallel Decoding in Diffusion LLMs Open
While most autoregressive LLMs are constrained to one-by-one decoding, diffusion LLMs (dLLMs) have attracted growing interest for their potential to dramatically accelerate inference through parallel decoding. Despite this promise, the con…
View article: LLM-Lasso: A Robust Framework for Domain-Informed Feature Selection and Regularization
LLM-Lasso: A Robust Framework for Domain-Informed Feature Selection and Regularization Open
We introduce LLM-Lasso, a novel framework that leverages large language models (LLMs) to guide feature selection in Lasso $\ell_1$ regression. Unlike traditional methods that rely solely on numerical data, LLM-Lasso incorporates domain-spe…
View article: VersaPRM: Multi-Domain Process Reward Model via Synthetic Reasoning Data
VersaPRM: Multi-Domain Process Reward Model via Synthetic Reasoning Data Open
Process Reward Models (PRMs) have proven effective at enhancing mathematical reasoning for Large Language Models (LLMs) by leveraging increased inference-time computation. However, they are predominantly trained on mathematical data and th…
View article: Self-Improving Transformers Overcome Easy-to-Hard and Length Generalization Challenges
Self-Improving Transformers Overcome Easy-to-Hard and Length Generalization Challenges Open
Large language models often struggle with length generalization and solving complex problem instances beyond their training distribution. We present a self-improvement approach where models iteratively generate and learn from their own sol…
View article: Task Vectors in In-Context Learning: Emergence, Formation, and Benefit
Task Vectors in In-Context Learning: Emergence, Formation, and Benefit Open
In-context learning is a remarkable capability of transformers, referring to their ability to adapt to specific tasks based on a short history or context. Previous research has found that task-specific information is locally encoded within…
View article: Multi-Bin Batching for Increasing LLM Inference Throughput
Multi-Bin Batching for Increasing LLM Inference Throughput Open
As large language models (LLMs) grow in popularity for their diverse capabilities, improving the efficiency of their inference systems has become increasingly critical. Batching LLM requests is a critical step in scheduling the inference j…
View article: Forward and Inverse Simulation of Pseudo-Two-Dimensional Model of Lithium-Ion Batteries Using Neural Networks
Forward and Inverse Simulation of Pseudo-Two-Dimensional Model of Lithium-Ion Batteries Using Neural Networks Open
In this work, we address the challenges posed by the high nonlinearity of the Butler-Volmer (BV) equation in forward and inverse simulations of the pseudo-two-dimensional (P2D) model using the physics-informed neural network (PINN) framewo…
View article: Rare-to-Frequent: Unlocking Compositional Generation Power of Diffusion Models on Rare Concepts with LLM Guidance
Rare-to-Frequent: Unlocking Compositional Generation Power of Diffusion Models on Rare Concepts with LLM Guidance Open
State-of-the-art text-to-image (T2I) diffusion models often struggle to generate rare compositions of concepts, e.g., objects with unusual attributes. In this paper, we show that the compositional generation power of diffusion models on su…
View article: Parameter-Efficient Fine-Tuning of State Space Models
Parameter-Efficient Fine-Tuning of State Space Models Open
Deep State Space Models (SSMs), such as Mamba (Gu & Dao, 2024), have become powerful tools for language modeling, offering high performance and linear scalability with sequence length. However, the application of parameter-efficient fine-t…
View article: Everything Everywhere All at Once: LLMs can In-Context Learn Multiple Tasks in Superposition
Everything Everywhere All at Once: LLMs can In-Context Learn Multiple Tasks in Superposition Open
Large Language Models (LLMs) have demonstrated remarkable in-context learning (ICL) capabilities. In this study, we explore a surprising phenomenon related to ICL: LLMs can perform multiple, computationally distinct ICL tasks simultaneousl…
View article: ENTP: Encoder-only Next Token Prediction
ENTP: Encoder-only Next Token Prediction Open
Next-token prediction is conventionally done using decoder-only Transformers with causal attention, as this approach allows for efficient reuse of keys and values. What if we were not compute-limited, should we still use decoder-only Trans…
View article: Looped Transformers for Length Generalization
Looped Transformers for Length Generalization Open
Recent work has shown that Transformers trained from scratch can successfully solve various arithmetic and algorithmic tasks, such as adding numbers and computing parity. While these Transformers generalize well on unseen inputs of the sam…
View article: Buffer-based Gradient Projection for Continual Federated Learning
Buffer-based Gradient Projection for Continual Federated Learning Open
Continual Federated Learning (CFL) is essential for enabling real-world applications where multiple decentralized clients adaptively learn from continuous data streams. A significant challenge in CFL is mitigating catastrophic forgetting, …
View article: From Artificial Needles to Real Haystacks: Improving Retrieval Capabilities in LLMs by Finetuning on Synthetic Data
From Artificial Needles to Real Haystacks: Improving Retrieval Capabilities in LLMs by Finetuning on Synthetic Data Open
Recent studies have shown that Large Language Models (LLMs) struggle to accurately retrieve information and maintain reasoning capabilities when processing long-context inputs. To address these limitations, we propose a finetuning approach…
View article: Dual Operating Modes of In-Context Learning
Dual Operating Modes of In-Context Learning Open
In-context learning (ICL) exhibits dual operating modes: task learning, i.e., acquiring a new skill from in-context samples, and task retrieval, i.e., locating and activating a relevant pretrained skill. Recent theoretical work investigate…
View article: Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks
Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks Open
State-space models (SSMs), such as Mamba (Gu & Dao, 2023), have been proposed as alternatives to Transformer networks in language modeling, by incorporating gating, convolutions, and input-dependent token selection to mitigate the quadrati…
View article: Can MLLMs Perform Text-to-Image In-Context Learning?
Can MLLMs Perform Text-to-Image In-Context Learning? Open
The evolution from Large Language Models (LLMs) to Multimodal Large Language Models (MLLMs) has spurred research into extending In-Context Learning (ICL) to its multimodal counterpart. Existing such studies have primarily concentrated on i…
View article: Forward and Inverse Simulation of Pseudo-Two-Dimensional Model of Lithium-Ion Batteries Using Neural Networks
Forward and Inverse Simulation of Pseudo-Two-Dimensional Model of Lithium-Ion Batteries Using Neural Networks Open
View article: Looped Transformers are Better at Learning Learning Algorithms
Looped Transformers are Better at Learning Learning Algorithms Open
Transformers have demonstrated effectiveness in in-context solving data-fitting problems from various (latent) models, as reported by Garg et al. However, the absence of an inherent iterative structure in the transformer architecture prese…
View article: Super-Resolution Emulation of Large Cosmological Fields with a 3D Conditional Diffusion Model
Super-Resolution Emulation of Large Cosmological Fields with a 3D Conditional Diffusion Model Open
High-resolution (HR) simulations in cosmology, in particular when including baryons, can take millions of CPU hours. On the other hand, low-resolution (LR) dark matter simulations of the same cosmological volume use minimal computing resou…
View article: Image Clustering Conditioned on Text Criteria
Image Clustering Conditioned on Text Criteria Open
Classical clustering methods do not provide users with direct control of the clustering results, and the clustering results may not be consistent with the relevant criterion that a user has in mind. In this work, we present a new methodolo…
View article: The Expressive Power of Low-Rank Adaptation
The Expressive Power of Low-Rank Adaptation Open
Low-Rank Adaptation (LoRA), a parameter-efficient fine-tuning method that leverages low-rank adaptation of weight matrices, has emerged as a prevalent technique for fine-tuning pre-trained models such as large language models and diffusion…
View article: Market Structure Analysis of Revenue of International Construction Professional Service (I-CPS): A Country-Level Analysis
Market Structure Analysis of Revenue of International Construction Professional Service (I-CPS): A Country-Level Analysis Open
International construction professional service (I-CPS) refers to a knowledge-intensive professional service (KIPS), such as architecture, engineering, and consultancy, which uses technology/human capital as its major input and is better p…
View article: Intuitive Access to Smartphone Settings Using Relevance Model Trained by Contrastive Learning
Intuitive Access to Smartphone Settings Using Relevance Model Trained by Contrastive Learning Open
The more new features that are being added to smartphones, the harder it becomes for users to find them. This is because the feature names are usually short, and there are just too many to remember. In such a case, the users may want to as…
View article: Predictive Pipelined Decoding: A Compute-Latency Trade-off for Exact LLM Decoding
Predictive Pipelined Decoding: A Compute-Latency Trade-off for Exact LLM Decoding Open
This paper presents "Predictive Pipelined Decoding (PPD)," an approach that speeds up greedy decoding in Large Language Models (LLMs) while maintaining the exact same output as the original decoding. Unlike conventional strategies, PPD emp…
View article: Mini-Batch Optimization of Contrastive Loss
Mini-Batch Optimization of Contrastive Loss Open
Contrastive learning has gained significant attention as a method for self-supervised learning. The contrastive loss function ensures that embeddings of positive sample pairs (e.g., different samples from the same class or different views …
View article: Teaching Arithmetic to Small Transformers
Teaching Arithmetic to Small Transformers Open
Large language models like GPT-4 exhibit emergent capabilities across general-purpose tasks, such as basic arithmetic, when trained on extensive text data, even though these tasks are not explicitly encoded by the unsupervised, next-token …
View article: Intuitive Access to Smartphone Settings Using Relevance Model Trained by Contrastive Learning
Intuitive Access to Smartphone Settings Using Relevance Model Trained by Contrastive Learning Open
The more new features that are being added to smartphones, the harder it becomes for users to find them. This is because the feature names are usually short and there are just too many of them for the users to remember the exact words. The…
View article: Variation Spaces for Multi-Output Neural Networks: Insights on Multi-Task Learning and Network Compression
Variation Spaces for Multi-Output Neural Networks: Insights on Multi-Task Learning and Network Compression Open
This paper introduces a novel theoretical framework for the analysis of vector-valued neural networks through the development of vector-valued variation spaces, a new class of reproducing kernel Banach spaces. These spaces emerge from stud…
View article: DPOK: Reinforcement Learning for Fine-tuning Text-to-Image Diffusion Models
DPOK: Reinforcement Learning for Fine-tuning Text-to-Image Diffusion Models Open
Learning from human feedback has been shown to improve text-to-image models. These techniques first learn a reward function that captures what humans care about in the task and then improve the models based on the learned reward function. …