RLTHF: Targeted Human Feedback for LLM Alignment Article Swipe
YOU?
·
· 2025
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2502.13417
Fine-tuning large language models (LLMs) to align with user preferences is challenging due to the high cost of quality human annotations in Reinforcement Learning from Human Feedback (RLHF) and the generalizability limitations of AI Feedback. To address these challenges, we propose RLTHF, a human-AI hybrid framework that combines LLM-based initial alignment with selective human annotations to achieve full-human annotation alignment with minimal effort. RLTHF identifies hard-to-annotate samples mislabeled by LLMs using a reward model's reward distribution and iteratively enhances alignment by integrating strategic human corrections while leveraging LLM's correctly labeled samples. Evaluations on HH-RLHF and TL;DR datasets show that RLTHF reaches full-human annotation-level alignment with only 6-7% of the human annotation effort. Furthermore, models trained on RLTHF's curated datasets for downstream tasks outperform those trained on fully human-annotated datasets, underscoring the effectiveness of RLTHF.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2502.13417
- https://arxiv.org/pdf/2502.13417
- OA Status
- green
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4407764293
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4407764293Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2502.13417Digital Object Identifier
- Title
-
RLTHF: Targeted Human Feedback for LLM AlignmentWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2025Year of publication
- Publication date
-
2025-02-19Full publication date if available
- Authors
-
Yifei Xu, Tusher Chakraborty, Emre Kıcıman, Bibek Aryal, E. Rodrigues, Srinagesh Sharma, Roberto Estêvão, María Angels de Luis Balaguer, Joel L. Wolk, Rafael Padilha, Leonardo Silva Nunes, Shobana Balakrishnan, Songwu Lu, Ranveer ChandraList of authors in order
- Landing page
-
https://arxiv.org/abs/2502.13417Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2502.13417Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2502.13417Direct OA link when available
- Concepts
-
Computer scienceTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4407764293 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2502.13417 |
| ids.doi | https://doi.org/10.48550/arxiv.2502.13417 |
| ids.openalex | https://openalex.org/W4407764293 |
| fwci | |
| type | preprint |
| title | RLTHF: Targeted Human Feedback for LLM Alignment |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10215 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.7336999773979187 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1702 |
| topics[0].subfield.display_name | Artificial Intelligence |
| topics[0].display_name | Semantic Web and Ontologies |
| topics[1].id | https://openalex.org/T10181 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.6261000037193298 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1702 |
| topics[1].subfield.display_name | Artificial Intelligence |
| topics[1].display_name | Natural Language Processing Techniques |
| topics[2].id | https://openalex.org/T14351 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.6137999892234802 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1702 |
| topics[2].subfield.display_name | Artificial Intelligence |
| topics[2].display_name | Statistical and Computational Modeling |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C41008148 |
| concepts[0].level | 0 |
| concepts[0].score | 0.4509413540363312 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[0].display_name | Computer science |
| keywords[0].id | https://openalex.org/keywords/computer-science |
| keywords[0].score | 0.4509413540363312 |
| keywords[0].display_name | Computer science |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2502.13417 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2502.13417 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2502.13417 |
| locations[1].id | doi:10.48550/arxiv.2502.13417 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2502.13417 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5069256680 |
| authorships[0].author.orcid | https://orcid.org/0000-0003-1329-3124 |
| authorships[0].author.display_name | Yifei Xu |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Xu, Yifei |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5043473455 |
| authorships[1].author.orcid | https://orcid.org/0000-0003-1656-5471 |
| authorships[1].author.display_name | Tusher Chakraborty |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Chakraborty, Tusher |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5112594439 |
| authorships[2].author.orcid | |
| authorships[2].author.display_name | Emre Kıcıman |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Kıcıman, Emre |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5020154361 |
| authorships[3].author.orcid | https://orcid.org/0000-0003-0257-7439 |
| authorships[3].author.display_name | Bibek Aryal |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Aryal, Bibek |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5023708186 |
| authorships[4].author.orcid | https://orcid.org/0000-0003-2846-7625 |
| authorships[4].author.display_name | E. Rodrigues |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Rodrigues, Eduardo |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A5103899562 |
| authorships[5].author.orcid | |
| authorships[5].author.display_name | Srinagesh Sharma |
| authorships[5].author_position | middle |
| authorships[5].raw_author_name | Sharma, Srinagesh |
| authorships[5].is_corresponding | False |
| authorships[6].author.id | https://openalex.org/A5073495435 |
| authorships[6].author.orcid | |
| authorships[6].author.display_name | Roberto Estêvão |
| authorships[6].author_position | middle |
| authorships[6].raw_author_name | Estevao, Roberto |
| authorships[6].is_corresponding | False |
| authorships[7].author.id | https://openalex.org/A5012552344 |
| authorships[7].author.orcid | https://orcid.org/0000-0002-6272-7841 |
| authorships[7].author.display_name | María Angels de Luis Balaguer |
| authorships[7].author_position | middle |
| authorships[7].raw_author_name | Balaguer, Maria Angels de Luis |
| authorships[7].is_corresponding | False |
| authorships[8].author.id | https://openalex.org/A5029069424 |
| authorships[8].author.orcid | |
| authorships[8].author.display_name | Joel L. Wolk |
| authorships[8].author_position | middle |
| authorships[8].raw_author_name | Wolk, Jessica |
| authorships[8].is_corresponding | False |
| authorships[9].author.id | https://openalex.org/A5027978607 |
| authorships[9].author.orcid | https://orcid.org/0000-0003-1944-5475 |
| authorships[9].author.display_name | Rafael Padilha |
| authorships[9].author_position | middle |
| authorships[9].raw_author_name | Padilha, Rafael |
| authorships[9].is_corresponding | False |
| authorships[10].author.id | https://openalex.org/A5033463791 |
| authorships[10].author.orcid | https://orcid.org/0009-0009-1296-1013 |
| authorships[10].author.display_name | Leonardo Silva Nunes |
| authorships[10].author_position | middle |
| authorships[10].raw_author_name | Nunes, Leonardo |
| authorships[10].is_corresponding | False |
| authorships[11].author.id | https://openalex.org/A5110322584 |
| authorships[11].author.orcid | |
| authorships[11].author.display_name | Shobana Balakrishnan |
| authorships[11].author_position | middle |
| authorships[11].raw_author_name | Balakrishnan, Shobana |
| authorships[11].is_corresponding | False |
| authorships[12].author.id | https://openalex.org/A5020188879 |
| authorships[12].author.orcid | https://orcid.org/0000-0003-3779-0918 |
| authorships[12].author.display_name | Songwu Lu |
| authorships[12].author_position | middle |
| authorships[12].raw_author_name | Lu, Songwu |
| authorships[12].is_corresponding | False |
| authorships[13].author.id | https://openalex.org/A5112443217 |
| authorships[13].author.orcid | |
| authorships[13].author.display_name | Ranveer Chandra |
| authorships[13].author_position | last |
| authorships[13].raw_author_name | Chandra, Ranveer |
| authorships[13].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2502.13417 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | RLTHF: Targeted Human Feedback for LLM Alignment |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10215 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.7336999773979187 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1702 |
| primary_topic.subfield.display_name | Artificial Intelligence |
| primary_topic.display_name | Semantic Web and Ontologies |
| related_works | https://openalex.org/W4391375266, https://openalex.org/W2899084033, https://openalex.org/W2748952813, https://openalex.org/W2390279801, https://openalex.org/W4391913857, https://openalex.org/W2358668433, https://openalex.org/W4396701345, https://openalex.org/W2376932109, https://openalex.org/W2001405890, https://openalex.org/W4396696052 |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2502.13417 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2502.13417 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2502.13417 |
| primary_location.id | pmh:oai:arXiv.org:2502.13417 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2502.13417 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2502.13417 |
| publication_date | 2025-02-19 |
| publication_year | 2025 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 42, 71 |
| abstract_inverted_index.AI | 33 |
| abstract_inverted_index.To | 35 |
| abstract_inverted_index.by | 68, 80 |
| abstract_inverted_index.in | 21 |
| abstract_inverted_index.is | 10 |
| abstract_inverted_index.of | 17, 32, 107, 132 |
| abstract_inverted_index.on | 92, 115, 125 |
| abstract_inverted_index.to | 5, 13, 55 |
| abstract_inverted_index.we | 39 |
| abstract_inverted_index.and | 28, 76, 94 |
| abstract_inverted_index.due | 12 |
| abstract_inverted_index.for | 119 |
| abstract_inverted_index.the | 14, 29, 108, 130 |
| abstract_inverted_index.6-7% | 106 |
| abstract_inverted_index.LLMs | 69 |
| abstract_inverted_index.cost | 16 |
| abstract_inverted_index.from | 24 |
| abstract_inverted_index.high | 15 |
| abstract_inverted_index.only | 105 |
| abstract_inverted_index.show | 97 |
| abstract_inverted_index.that | 46, 98 |
| abstract_inverted_index.user | 8 |
| abstract_inverted_index.with | 7, 51, 60, 104 |
| abstract_inverted_index.Human | 25 |
| abstract_inverted_index.LLM's | 87 |
| abstract_inverted_index.RLTHF | 63, 99 |
| abstract_inverted_index.TL;DR | 95 |
| abstract_inverted_index.align | 6 |
| abstract_inverted_index.fully | 126 |
| abstract_inverted_index.human | 19, 53, 83, 109 |
| abstract_inverted_index.large | 1 |
| abstract_inverted_index.tasks | 121 |
| abstract_inverted_index.these | 37 |
| abstract_inverted_index.those | 123 |
| abstract_inverted_index.using | 70 |
| abstract_inverted_index.while | 85 |
| abstract_inverted_index.(LLMs) | 4 |
| abstract_inverted_index.(RLHF) | 27 |
| abstract_inverted_index.RLTHF, | 41 |
| abstract_inverted_index.RLTHF. | 133 |
| abstract_inverted_index.hybrid | 44 |
| abstract_inverted_index.models | 3, 113 |
| abstract_inverted_index.reward | 72, 74 |
| abstract_inverted_index.HH-RLHF | 93 |
| abstract_inverted_index.RLTHF's | 116 |
| abstract_inverted_index.achieve | 56 |
| abstract_inverted_index.address | 36 |
| abstract_inverted_index.curated | 117 |
| abstract_inverted_index.effort. | 62, 111 |
| abstract_inverted_index.initial | 49 |
| abstract_inverted_index.labeled | 89 |
| abstract_inverted_index.minimal | 61 |
| abstract_inverted_index.model's | 73 |
| abstract_inverted_index.propose | 40 |
| abstract_inverted_index.quality | 18 |
| abstract_inverted_index.reaches | 100 |
| abstract_inverted_index.samples | 66 |
| abstract_inverted_index.trained | 114, 124 |
| abstract_inverted_index.Feedback | 26 |
| abstract_inverted_index.Learning | 23 |
| abstract_inverted_index.combines | 47 |
| abstract_inverted_index.datasets | 96, 118 |
| abstract_inverted_index.enhances | 78 |
| abstract_inverted_index.human-AI | 43 |
| abstract_inverted_index.language | 2 |
| abstract_inverted_index.samples. | 90 |
| abstract_inverted_index.Feedback. | 34 |
| abstract_inverted_index.LLM-based | 48 |
| abstract_inverted_index.alignment | 50, 59, 79, 103 |
| abstract_inverted_index.correctly | 88 |
| abstract_inverted_index.datasets, | 128 |
| abstract_inverted_index.framework | 45 |
| abstract_inverted_index.selective | 52 |
| abstract_inverted_index.strategic | 82 |
| abstract_inverted_index.annotation | 58, 110 |
| abstract_inverted_index.downstream | 120 |
| abstract_inverted_index.full-human | 57, 101 |
| abstract_inverted_index.identifies | 64 |
| abstract_inverted_index.leveraging | 86 |
| abstract_inverted_index.mislabeled | 67 |
| abstract_inverted_index.outperform | 122 |
| abstract_inverted_index.Evaluations | 91 |
| abstract_inverted_index.Fine-tuning | 0 |
| abstract_inverted_index.annotations | 20, 54 |
| abstract_inverted_index.challenges, | 38 |
| abstract_inverted_index.challenging | 11 |
| abstract_inverted_index.corrections | 84 |
| abstract_inverted_index.integrating | 81 |
| abstract_inverted_index.iteratively | 77 |
| abstract_inverted_index.limitations | 31 |
| abstract_inverted_index.preferences | 9 |
| abstract_inverted_index.Furthermore, | 112 |
| abstract_inverted_index.distribution | 75 |
| abstract_inverted_index.underscoring | 129 |
| abstract_inverted_index.Reinforcement | 22 |
| abstract_inverted_index.effectiveness | 131 |
| abstract_inverted_index.human-annotated | 127 |
| abstract_inverted_index.annotation-level | 102 |
| abstract_inverted_index.generalizability | 30 |
| abstract_inverted_index.hard-to-annotate | 65 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 14 |
| citation_normalized_percentile |