Yinzhuo Chen
YOU?
Author Swipe
View article: PMoL: Parameter Efficient MoE for Preference Mixing of LLM Alignment
PMoL: Parameter Efficient MoE for Preference Mixing of LLM Alignment Open
Reinforcement Learning from Human Feedback (RLHF) has been proven to be an effective method for preference alignment of large language models (LLMs) and is widely used in the post-training process of LLMs. However, RLHF struggles with hand…