arXiv (Cornell University)
Margin-aware Preference Optimization for Aligning Diffusion Models without Reference
June 2024 • Jiwoo Hong, Sayak Paul, Noah Lee, Kashif Rasul, James H. Thorne, Jongheon Jeong
Modern preference alignment methods, such as DPO, rely on divergence regularization to a reference model for training stability-but this creates a fundamental problem we call "reference mismatch." In this paper, we investigate the negative impacts of reference mismatch in aligning text-to-image (T2I) diffusion models, showing that larger reference mismatch hinders effective adaptation given the same amount of data, e.g., as when learning new artistic styles, or personalizing to specific objects. We demonstrate thi…