Preference Optimization with Multi-Sample Comparisons

Exploring foci of: arXiv (Cornell University) Preference Optimization with Multi-Sample Comparisons October 2024 • Chaoqi Wang, Zhimiao Zhao, Chen Zhu, Karthik Abinav Sankararaman, Michal Vaľko, Xuefei Cao, Zhaorun Chen, Madian Khabsa, Y. Q. Chen, Hao Ma, Sinong W… Recent advancements in generative models, particularly large language models (LLMs) and diffusion models, have been driven by extensive pretraining on large datasets followed by post-training. However, current post-training methods such as reinforcement learning from human feedback (RLHF) and direct alignment from preference methods (DAP) primarily utilize single-sample comparisons. These approaches often fail to capture critical characteristics such as generative diversity and bias, which are more accurately asse… Open Article Page

Statistics Mathematics Computer Science Chemistry Chromatography Open Article