PMoL: Parameter Efficient MoE for Preference Mixing of LLM Alignment Article Swipe
YOU?
·
· 2024
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2411.01245
Reinforcement Learning from Human Feedback (RLHF) has been proven to be an effective method for preference alignment of large language models (LLMs) and is widely used in the post-training process of LLMs. However, RLHF struggles with handling multiple competing preferences. This leads to a decrease in the alignment of LLMs with human preferences. To address this issue, we propose Preference Mixture of LoRAs (PMoL) from the perspective of model architecture, which can adapt to any number of preferences to mix. PMoL combines Mixture of Experts (MoE) and Low Rank Adaptor (LoRA). This architecture is innovatively applied to the research of preference alignment and has achieved significant performance improvement. The expert group soft loss is used to enable MoE with the ability to mix preferences. Through comprehensive evaluation by the reward model and GPT-4o, the experiment results show that PMoL has superior preference mixing capabilities compared to baseline methods. PMoL achieves better preference alignment with lower training costs.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2411.01245
- https://arxiv.org/pdf/2411.01245
- OA Status
- green
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4404351351
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4404351351Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2411.01245Digital Object Identifier
- Title
-
PMoL: Parameter Efficient MoE for Preference Mixing of LLM AlignmentWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2024Year of publication
- Publication date
-
2024-11-02Full publication date if available
- Authors
-
Deyu Liu, Bing Xu, Yinzhuo Chen, Baiping Xu, Wenpeng Lü, Muyun Yang, Tiejun ZhaoList of authors in order
- Landing page
-
https://arxiv.org/abs/2411.01245Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2411.01245Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2411.01245Direct OA link when available
- Concepts
-
Mixing (physics), Preference, Mathematics, Computer science, Physics, Statistics, Quantum mechanicsTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4404351351 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2411.01245 |
| ids.doi | https://doi.org/10.48550/arxiv.2411.01245 |
| ids.openalex | https://openalex.org/W4404351351 |
| fwci | |
| type | preprint |
| title | PMoL: Parameter Efficient MoE for Preference Mixing of LLM Alignment |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10181 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9793000221252441 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1702 |
| topics[0].subfield.display_name | Artificial Intelligence |
| topics[0].display_name | Natural Language Processing Techniques |
| topics[1].id | https://openalex.org/T11063 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.953499972820282 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1703 |
| topics[1].subfield.display_name | Computational Theory and Mathematics |
| topics[1].display_name | Rough Sets and Fuzzy Logic |
| topics[2].id | https://openalex.org/T13999 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.9232000112533569 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1710 |
| topics[2].subfield.display_name | Information Systems |
| topics[2].display_name | Digital Rights Management and Security |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C138777275 |
| concepts[0].level | 2 |
| concepts[0].score | 0.7729140520095825 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q6884054 |
| concepts[0].display_name | Mixing (physics) |
| concepts[1].id | https://openalex.org/C2781249084 |
| concepts[1].level | 2 |
| concepts[1].score | 0.6197450160980225 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q908656 |
| concepts[1].display_name | Preference |
| concepts[2].id | https://openalex.org/C33923547 |
| concepts[2].level | 0 |
| concepts[2].score | 0.3845593333244324 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q395 |
| concepts[2].display_name | Mathematics |
| concepts[3].id | https://openalex.org/C41008148 |
| concepts[3].level | 0 |
| concepts[3].score | 0.32169491052627563 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[3].display_name | Computer science |
| concepts[4].id | https://openalex.org/C121332964 |
| concepts[4].level | 0 |
| concepts[4].score | 0.2775244116783142 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q413 |
| concepts[4].display_name | Physics |
| concepts[5].id | https://openalex.org/C105795698 |
| concepts[5].level | 1 |
| concepts[5].score | 0.2504141330718994 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q12483 |
| concepts[5].display_name | Statistics |
| concepts[6].id | https://openalex.org/C62520636 |
| concepts[6].level | 1 |
| concepts[6].score | 0.0 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q944 |
| concepts[6].display_name | Quantum mechanics |
| keywords[0].id | https://openalex.org/keywords/mixing |
| keywords[0].score | 0.7729140520095825 |
| keywords[0].display_name | Mixing (physics) |
| keywords[1].id | https://openalex.org/keywords/preference |
| keywords[1].score | 0.6197450160980225 |
| keywords[1].display_name | Preference |
| keywords[2].id | https://openalex.org/keywords/mathematics |
| keywords[2].score | 0.3845593333244324 |
| keywords[2].display_name | Mathematics |
| keywords[3].id | https://openalex.org/keywords/computer-science |
| keywords[3].score | 0.32169491052627563 |
| keywords[3].display_name | Computer science |
| keywords[4].id | https://openalex.org/keywords/physics |
| keywords[4].score | 0.2775244116783142 |
| keywords[4].display_name | Physics |
| keywords[5].id | https://openalex.org/keywords/statistics |
| keywords[5].score | 0.2504141330718994 |
| keywords[5].display_name | Statistics |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2411.01245 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2411.01245 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2411.01245 |
| locations[1].id | doi:10.48550/arxiv.2411.01245 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2411.01245 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5013126069 |
| authorships[0].author.orcid | https://orcid.org/0000-0002-2578-9525 |
| authorships[0].author.display_name | Deyu Liu |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Liu, Dongxu |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5101926912 |
| authorships[1].author.orcid | https://orcid.org/0000-0002-2094-7002 |
| authorships[1].author.display_name | Bing Xu |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Xu, Bing |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5079842923 |
| authorships[2].author.orcid | |
| authorships[2].author.display_name | Yinzhuo Chen |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Chen, Yinzhuo |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5021958428 |
| authorships[3].author.orcid | https://orcid.org/0000-0003-2891-5635 |
| authorships[3].author.display_name | Baiping Xu |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Xu, Bufan |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5076564877 |
| authorships[4].author.orcid | https://orcid.org/0000-0002-1840-3540 |
| authorships[4].author.display_name | Wenpeng Lü |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Lu, Wenpeng |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A5108053327 |
| authorships[5].author.orcid | https://orcid.org/0000-0002-5940-0266 |
| authorships[5].author.display_name | Muyun Yang |
| authorships[5].author_position | middle |
| authorships[5].raw_author_name | Yang, Muyun |
| authorships[5].is_corresponding | False |
| authorships[6].author.id | https://openalex.org/A5038560914 |
| authorships[6].author.orcid | https://orcid.org/0000-0002-1754-5416 |
| authorships[6].author.display_name | Tiejun Zhao |
| authorships[6].author_position | last |
| authorships[6].raw_author_name | Zhao, Tiejun |
| authorships[6].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2411.01245 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | PMoL: Parameter Efficient MoE for Preference Mixing of LLM Alignment |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10181 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9793000221252441 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1702 |
| primary_topic.subfield.display_name | Artificial Intelligence |
| primary_topic.display_name | Natural Language Processing Techniques |
| related_works | https://openalex.org/W2899084033, https://openalex.org/W2748952813, https://openalex.org/W4391375266, https://openalex.org/W1979597421, https://openalex.org/W2007980826, https://openalex.org/W2061531152, https://openalex.org/W3002753104, https://openalex.org/W2077600819, https://openalex.org/W2142036596, https://openalex.org/W2072657027 |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2411.01245 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2411.01245 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2411.01245 |
| primary_location.id | pmh:oai:arXiv.org:2411.01245 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2411.01245 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2411.01245 |
| publication_date | 2024-11-02 |
| publication_year | 2024 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 43 |
| abstract_inverted_index.To | 53 |
| abstract_inverted_index.an | 11 |
| abstract_inverted_index.be | 10 |
| abstract_inverted_index.by | 127 |
| abstract_inverted_index.in | 26, 45 |
| abstract_inverted_index.is | 23, 93, 113 |
| abstract_inverted_index.of | 17, 30, 48, 61, 67, 76, 83, 99 |
| abstract_inverted_index.to | 9, 42, 73, 78, 96, 115, 121, 145 |
| abstract_inverted_index.we | 57 |
| abstract_inverted_index.Low | 87 |
| abstract_inverted_index.MoE | 117 |
| abstract_inverted_index.The | 108 |
| abstract_inverted_index.and | 22, 86, 102, 131 |
| abstract_inverted_index.any | 74 |
| abstract_inverted_index.can | 71 |
| abstract_inverted_index.for | 14 |
| abstract_inverted_index.has | 6, 103, 139 |
| abstract_inverted_index.mix | 122 |
| abstract_inverted_index.the | 27, 46, 65, 97, 119, 128, 133 |
| abstract_inverted_index.LLMs | 49 |
| abstract_inverted_index.PMoL | 80, 138, 148 |
| abstract_inverted_index.RLHF | 33 |
| abstract_inverted_index.Rank | 88 |
| abstract_inverted_index.This | 40, 91 |
| abstract_inverted_index.been | 7 |
| abstract_inverted_index.from | 2, 64 |
| abstract_inverted_index.loss | 112 |
| abstract_inverted_index.mix. | 79 |
| abstract_inverted_index.show | 136 |
| abstract_inverted_index.soft | 111 |
| abstract_inverted_index.that | 137 |
| abstract_inverted_index.this | 55 |
| abstract_inverted_index.used | 25, 114 |
| abstract_inverted_index.with | 35, 50, 118, 153 |
| abstract_inverted_index.(MoE) | 85 |
| abstract_inverted_index.Human | 3 |
| abstract_inverted_index.LLMs. | 31 |
| abstract_inverted_index.LoRAs | 62 |
| abstract_inverted_index.adapt | 72 |
| abstract_inverted_index.group | 110 |
| abstract_inverted_index.human | 51 |
| abstract_inverted_index.large | 18 |
| abstract_inverted_index.leads | 41 |
| abstract_inverted_index.lower | 154 |
| abstract_inverted_index.model | 68, 130 |
| abstract_inverted_index.which | 70 |
| abstract_inverted_index.(LLMs) | 21 |
| abstract_inverted_index.(PMoL) | 63 |
| abstract_inverted_index.(RLHF) | 5 |
| abstract_inverted_index.better | 150 |
| abstract_inverted_index.costs. | 156 |
| abstract_inverted_index.enable | 116 |
| abstract_inverted_index.expert | 109 |
| abstract_inverted_index.issue, | 56 |
| abstract_inverted_index.method | 13 |
| abstract_inverted_index.mixing | 142 |
| abstract_inverted_index.models | 20 |
| abstract_inverted_index.number | 75 |
| abstract_inverted_index.proven | 8 |
| abstract_inverted_index.reward | 129 |
| abstract_inverted_index.widely | 24 |
| abstract_inverted_index.(LoRA). | 90 |
| abstract_inverted_index.Adaptor | 89 |
| abstract_inverted_index.Experts | 84 |
| abstract_inverted_index.GPT-4o, | 132 |
| abstract_inverted_index.Mixture | 60, 82 |
| abstract_inverted_index.Through | 124 |
| abstract_inverted_index.ability | 120 |
| abstract_inverted_index.address | 54 |
| abstract_inverted_index.applied | 95 |
| abstract_inverted_index.process | 29 |
| abstract_inverted_index.propose | 58 |
| abstract_inverted_index.results | 135 |
| abstract_inverted_index.Feedback | 4 |
| abstract_inverted_index.However, | 32 |
| abstract_inverted_index.Learning | 1 |
| abstract_inverted_index.achieved | 104 |
| abstract_inverted_index.achieves | 149 |
| abstract_inverted_index.baseline | 146 |
| abstract_inverted_index.combines | 81 |
| abstract_inverted_index.compared | 144 |
| abstract_inverted_index.decrease | 44 |
| abstract_inverted_index.handling | 36 |
| abstract_inverted_index.language | 19 |
| abstract_inverted_index.methods. | 147 |
| abstract_inverted_index.multiple | 37 |
| abstract_inverted_index.research | 98 |
| abstract_inverted_index.superior | 140 |
| abstract_inverted_index.training | 155 |
| abstract_inverted_index.alignment | 16, 47, 101, 152 |
| abstract_inverted_index.competing | 38 |
| abstract_inverted_index.effective | 12 |
| abstract_inverted_index.struggles | 34 |
| abstract_inverted_index.Preference | 59 |
| abstract_inverted_index.evaluation | 126 |
| abstract_inverted_index.experiment | 134 |
| abstract_inverted_index.preference | 15, 100, 141, 151 |
| abstract_inverted_index.performance | 106 |
| abstract_inverted_index.perspective | 66 |
| abstract_inverted_index.preferences | 77 |
| abstract_inverted_index.significant | 105 |
| abstract_inverted_index.architecture | 92 |
| abstract_inverted_index.capabilities | 143 |
| abstract_inverted_index.improvement. | 107 |
| abstract_inverted_index.innovatively | 94 |
| abstract_inverted_index.preferences. | 39, 52, 123 |
| abstract_inverted_index.Reinforcement | 0 |
| abstract_inverted_index.architecture, | 69 |
| abstract_inverted_index.comprehensive | 125 |
| abstract_inverted_index.post-training | 28 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 7 |
| citation_normalized_percentile |