Dynamic Gradient Alignment for Online Data Mixing

Exploring foci of: arXiv (Cornell University) Dynamic Gradient Alignment for Online Data Mixing October 2024 • Simin Fan, David Grangier, Pierre Ablin The composition of training data mixtures is critical for effectively training large language models (LLMs), as it directly impacts their performance on downstream tasks. Our goal is to identify an optimal data mixture to specialize an LLM for a specific task with access to only a few examples. Traditional approaches to this problem include ad-hoc reweighting methods, importance sampling, and gradient alignment techniques. This paper focuses on gradient alignment and introduces Dynamic Gradient Alignment (DGA), a … Open Article Page

Computer Science Physics Quantum Mechanics Open Article