Direct Language Model Alignment from Online AI Feedback Article Swipe
YOU?
·
· 2024
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2402.04792
Direct alignment from preferences (DAP) methods, such as DPO, have recently emerged as efficient alternatives to reinforcement learning from human feedback (RLHF), that do not require a separate reward model. However, the preference datasets used in DAP methods are usually collected ahead of training and never updated, thus the feedback is purely offline. Moreover, responses in these datasets are often sampled from a language model distinct from the one being aligned, and since the model evolves over training, the alignment phase is inevitably off-policy. In this study, we posit that online feedback is key and improves DAP methods. Our method, online AI feedback (OAIF), uses an LLM as annotator: on each training iteration, we sample two responses from the current model and prompt the LLM annotator to choose which one is preferred, thus providing online feedback. Despite its simplicity, we demonstrate via human evaluation in several tasks that OAIF outperforms both offline DAP and RLHF methods. We further show that the feedback leveraged in OAIF is easily controllable, via instruction prompts to the LLM annotator.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2402.04792
- https://arxiv.org/pdf/2402.04792
- OA Status
- green
- Cited By
- 3
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4391673254
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4391673254Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2402.04792Digital Object Identifier
- Title
-
Direct Language Model Alignment from Online AI FeedbackWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2024Year of publication
- Publication date
-
2024-02-07Full publication date if available
- Authors
-
Shangmin Guo, Biao Zhang, Tianlin Liu, Tianqi Liu, Misha Khalman, Felipe Llinares, Alexandre Ramé, Thomas Mesnard, Yao Zhao, Bilal Piot, Johan Ferret, Mathieu BlondelList of authors in order
- Landing page
-
https://arxiv.org/abs/2402.04792Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2402.04792Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2402.04792Direct OA link when available
- Concepts
-
Computer science, Natural language processing, Language model, Artificial intelligenceTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
3Total citation count in OpenAlex
- Citations by year (recent)
-
2025: 3Per-year citation counts (last 5 years)
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4391673254 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2402.04792 |
| ids.doi | https://doi.org/10.48550/arxiv.2402.04792 |
| ids.openalex | https://openalex.org/W4391673254 |
| fwci | |
| type | preprint |
| title | Direct Language Model Alignment from Online AI Feedback |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10181 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9714999794960022 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1702 |
| topics[0].subfield.display_name | Artificial Intelligence |
| topics[0].display_name | Natural Language Processing Techniques |
| topics[1].id | https://openalex.org/T10028 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9366000294685364 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1702 |
| topics[1].subfield.display_name | Artificial Intelligence |
| topics[1].display_name | Topic Modeling |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C41008148 |
| concepts[0].level | 0 |
| concepts[0].score | 0.6136571764945984 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[0].display_name | Computer science |
| concepts[1].id | https://openalex.org/C204321447 |
| concepts[1].level | 1 |
| concepts[1].score | 0.4751979112625122 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q30642 |
| concepts[1].display_name | Natural language processing |
| concepts[2].id | https://openalex.org/C137293760 |
| concepts[2].level | 2 |
| concepts[2].score | 0.47366294264793396 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q3621696 |
| concepts[2].display_name | Language model |
| concepts[3].id | https://openalex.org/C154945302 |
| concepts[3].level | 1 |
| concepts[3].score | 0.43369990587234497 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[3].display_name | Artificial intelligence |
| keywords[0].id | https://openalex.org/keywords/computer-science |
| keywords[0].score | 0.6136571764945984 |
| keywords[0].display_name | Computer science |
| keywords[1].id | https://openalex.org/keywords/natural-language-processing |
| keywords[1].score | 0.4751979112625122 |
| keywords[1].display_name | Natural language processing |
| keywords[2].id | https://openalex.org/keywords/language-model |
| keywords[2].score | 0.47366294264793396 |
| keywords[2].display_name | Language model |
| keywords[3].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[3].score | 0.43369990587234497 |
| keywords[3].display_name | Artificial intelligence |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2402.04792 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2402.04792 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2402.04792 |
| locations[1].id | doi:10.48550/arxiv.2402.04792 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2402.04792 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5065424581 |
| authorships[0].author.orcid | https://orcid.org/0000-0003-1716-0994 |
| authorships[0].author.display_name | Shangmin Guo |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Guo, Shangmin |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5100363719 |
| authorships[1].author.orcid | https://orcid.org/0000-0002-4865-7090 |
| authorships[1].author.display_name | Biao Zhang |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Zhang, Biao |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5101848358 |
| authorships[2].author.orcid | https://orcid.org/0000-0002-3485-2629 |
| authorships[2].author.display_name | Tianlin Liu |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Liu, Tianlin |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5100618135 |
| authorships[3].author.orcid | https://orcid.org/0000-0002-0224-045X |
| authorships[3].author.display_name | Tianqi Liu |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Liu, Tianqi |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5058295788 |
| authorships[4].author.orcid | |
| authorships[4].author.display_name | Misha Khalman |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Khalman, Misha |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A5039330263 |
| authorships[5].author.orcid | |
| authorships[5].author.display_name | Felipe Llinares |
| authorships[5].author_position | middle |
| authorships[5].raw_author_name | Llinares, Felipe |
| authorships[5].is_corresponding | False |
| authorships[6].author.id | https://openalex.org/A5057217987 |
| authorships[6].author.orcid | |
| authorships[6].author.display_name | Alexandre Ramé |
| authorships[6].author_position | middle |
| authorships[6].raw_author_name | Rame, Alexandre |
| authorships[6].is_corresponding | False |
| authorships[7].author.id | https://openalex.org/A5082334070 |
| authorships[7].author.orcid | |
| authorships[7].author.display_name | Thomas Mesnard |
| authorships[7].author_position | middle |
| authorships[7].raw_author_name | Mesnard, Thomas |
| authorships[7].is_corresponding | False |
| authorships[8].author.id | https://openalex.org/A5009318707 |
| authorships[8].author.orcid | https://orcid.org/0000-0001-9370-7934 |
| authorships[8].author.display_name | Yao Zhao |
| authorships[8].author_position | middle |
| authorships[8].raw_author_name | Zhao, Yao |
| authorships[8].is_corresponding | False |
| authorships[9].author.id | https://openalex.org/A5051968502 |
| authorships[9].author.orcid | https://orcid.org/0000-0002-6456-7183 |
| authorships[9].author.display_name | Bilal Piot |
| authorships[9].author_position | middle |
| authorships[9].raw_author_name | Piot, Bilal |
| authorships[9].is_corresponding | False |
| authorships[10].author.id | https://openalex.org/A5087706654 |
| authorships[10].author.orcid | |
| authorships[10].author.display_name | Johan Ferret |
| authorships[10].author_position | middle |
| authorships[10].raw_author_name | Ferret, Johan |
| authorships[10].is_corresponding | False |
| authorships[11].author.id | https://openalex.org/A5049123454 |
| authorships[11].author.orcid | https://orcid.org/0000-0002-2366-2993 |
| authorships[11].author.display_name | Mathieu Blondel |
| authorships[11].author_position | last |
| authorships[11].raw_author_name | Blondel, Mathieu |
| authorships[11].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2402.04792 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Direct Language Model Alignment from Online AI Feedback |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10181 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9714999794960022 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1702 |
| primary_topic.subfield.display_name | Artificial Intelligence |
| primary_topic.display_name | Natural Language Processing Techniques |
| related_works | https://openalex.org/W2748952813, https://openalex.org/W2390279801, https://openalex.org/W2358668433, https://openalex.org/W2376932109, https://openalex.org/W2001405890, https://openalex.org/W2382290278, https://openalex.org/W2478288626, https://openalex.org/W2350741829, https://openalex.org/W2530322880, https://openalex.org/W3204019825 |
| cited_by_count | 3 |
| counts_by_year[0].year | 2025 |
| counts_by_year[0].cited_by_count | 3 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2402.04792 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2402.04792 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2402.04792 |
| primary_location.id | pmh:oai:arXiv.org:2402.04792 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2402.04792 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2402.04792 |
| publication_date | 2024-02-07 |
| publication_year | 2024 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 26, 62 |
| abstract_inverted_index.AI | 101 |
| abstract_inverted_index.In | 84 |
| abstract_inverted_index.We | 156 |
| abstract_inverted_index.an | 105 |
| abstract_inverted_index.as | 7, 12, 107 |
| abstract_inverted_index.do | 23 |
| abstract_inverted_index.in | 35, 55, 144, 163 |
| abstract_inverted_index.is | 50, 81, 92, 130, 165 |
| abstract_inverted_index.of | 42 |
| abstract_inverted_index.on | 109 |
| abstract_inverted_index.to | 15, 126, 171 |
| abstract_inverted_index.we | 87, 113, 139 |
| abstract_inverted_index.DAP | 36, 96, 152 |
| abstract_inverted_index.LLM | 106, 124, 173 |
| abstract_inverted_index.Our | 98 |
| abstract_inverted_index.and | 44, 71, 94, 121, 153 |
| abstract_inverted_index.are | 38, 58 |
| abstract_inverted_index.its | 137 |
| abstract_inverted_index.key | 93 |
| abstract_inverted_index.not | 24 |
| abstract_inverted_index.one | 68, 129 |
| abstract_inverted_index.the | 31, 48, 67, 73, 78, 118, 123, 160, 172 |
| abstract_inverted_index.two | 115 |
| abstract_inverted_index.via | 141, 168 |
| abstract_inverted_index.DPO, | 8 |
| abstract_inverted_index.OAIF | 148, 164 |
| abstract_inverted_index.RLHF | 154 |
| abstract_inverted_index.both | 150 |
| abstract_inverted_index.each | 110 |
| abstract_inverted_index.from | 2, 18, 61, 66, 117 |
| abstract_inverted_index.have | 9 |
| abstract_inverted_index.over | 76 |
| abstract_inverted_index.show | 158 |
| abstract_inverted_index.such | 6 |
| abstract_inverted_index.that | 22, 89, 147, 159 |
| abstract_inverted_index.this | 85 |
| abstract_inverted_index.thus | 47, 132 |
| abstract_inverted_index.used | 34 |
| abstract_inverted_index.uses | 104 |
| abstract_inverted_index.(DAP) | 4 |
| abstract_inverted_index.ahead | 41 |
| abstract_inverted_index.being | 69 |
| abstract_inverted_index.human | 19, 142 |
| abstract_inverted_index.model | 64, 74, 120 |
| abstract_inverted_index.never | 45 |
| abstract_inverted_index.often | 59 |
| abstract_inverted_index.phase | 80 |
| abstract_inverted_index.posit | 88 |
| abstract_inverted_index.since | 72 |
| abstract_inverted_index.tasks | 146 |
| abstract_inverted_index.these | 56 |
| abstract_inverted_index.which | 128 |
| abstract_inverted_index.Direct | 0 |
| abstract_inverted_index.choose | 127 |
| abstract_inverted_index.easily | 166 |
| abstract_inverted_index.model. | 29 |
| abstract_inverted_index.online | 90, 100, 134 |
| abstract_inverted_index.prompt | 122 |
| abstract_inverted_index.purely | 51 |
| abstract_inverted_index.reward | 28 |
| abstract_inverted_index.sample | 114 |
| abstract_inverted_index.study, | 86 |
| abstract_inverted_index.(OAIF), | 103 |
| abstract_inverted_index.(RLHF), | 21 |
| abstract_inverted_index.Despite | 136 |
| abstract_inverted_index.current | 119 |
| abstract_inverted_index.emerged | 11 |
| abstract_inverted_index.evolves | 75 |
| abstract_inverted_index.further | 157 |
| abstract_inverted_index.method, | 99 |
| abstract_inverted_index.methods | 37 |
| abstract_inverted_index.offline | 151 |
| abstract_inverted_index.prompts | 170 |
| abstract_inverted_index.require | 25 |
| abstract_inverted_index.sampled | 60 |
| abstract_inverted_index.several | 145 |
| abstract_inverted_index.usually | 39 |
| abstract_inverted_index.However, | 30 |
| abstract_inverted_index.aligned, | 70 |
| abstract_inverted_index.datasets | 33, 57 |
| abstract_inverted_index.distinct | 65 |
| abstract_inverted_index.feedback | 20, 49, 91, 102, 161 |
| abstract_inverted_index.improves | 95 |
| abstract_inverted_index.language | 63 |
| abstract_inverted_index.learning | 17 |
| abstract_inverted_index.methods, | 5 |
| abstract_inverted_index.methods. | 97, 155 |
| abstract_inverted_index.offline. | 52 |
| abstract_inverted_index.recently | 10 |
| abstract_inverted_index.separate | 27 |
| abstract_inverted_index.training | 43, 111 |
| abstract_inverted_index.updated, | 46 |
| abstract_inverted_index.Moreover, | 53 |
| abstract_inverted_index.alignment | 1, 79 |
| abstract_inverted_index.annotator | 125 |
| abstract_inverted_index.collected | 40 |
| abstract_inverted_index.efficient | 13 |
| abstract_inverted_index.feedback. | 135 |
| abstract_inverted_index.leveraged | 162 |
| abstract_inverted_index.providing | 133 |
| abstract_inverted_index.responses | 54, 116 |
| abstract_inverted_index.training, | 77 |
| abstract_inverted_index.annotator. | 174 |
| abstract_inverted_index.annotator: | 108 |
| abstract_inverted_index.evaluation | 143 |
| abstract_inverted_index.inevitably | 82 |
| abstract_inverted_index.iteration, | 112 |
| abstract_inverted_index.preference | 32 |
| abstract_inverted_index.preferred, | 131 |
| abstract_inverted_index.demonstrate | 140 |
| abstract_inverted_index.instruction | 169 |
| abstract_inverted_index.off-policy. | 83 |
| abstract_inverted_index.outperforms | 149 |
| abstract_inverted_index.preferences | 3 |
| abstract_inverted_index.simplicity, | 138 |
| abstract_inverted_index.alternatives | 14 |
| abstract_inverted_index.controllable, | 167 |
| abstract_inverted_index.reinforcement | 16 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 12 |
| citation_normalized_percentile |