Exploring foci of:
arXiv (Cornell University)
Preference Optimization with Multi-Sample Comparisons
October 2024 • Chaoqi Wang, Zhimiao Zhao, Chen Zhu, Karthik Abinav Sankararaman, Michal Vaľko, Xuefei Cao, Zhaorun Chen, Madian Khabsa, Y. Q. Chen, Hao Ma, Sinong W…
Recent advancements in generative models, particularly large language models (LLMs) and diffusion models, have been driven by extensive pretraining on large datasets followed by post-training. However, current post-training methods such as reinforcement learning from human feedback (RLHF) and direct alignment from preference methods (DAP) primarily utilize single-sample comparisons. These approaches often fail to capture critical characteristics such as generative diversity and bias, which are more accurately asse…
Statistics
Mathematics
Computer Science
Chemistry
Chromatography