Automated Rewards via LLM-Generated Progress Functions Article Swipe
YOU?
·
· 2024
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2410.09187
Large Language Models (LLMs) have the potential to automate reward engineering by leveraging their broad domain knowledge across various tasks. However, they often need many iterations of trial-and-error to generate effective reward functions. This process is costly because evaluating every sampled reward function requires completing the full policy optimization process for each function. In this paper, we introduce an LLM-driven reward generation framework that is able to produce state-of-the-art policies on the challenging Bi-DexHands benchmark with 20x fewer reward function samples than the prior state-of-the-art work. Our key insight is that we reduce the problem of generating task-specific rewards to the problem of coarsely estimating task progress. Our two-step solution leverages the task domain knowledge and the code synthesis abilities of LLMs to author progress functions that estimate task progress from a given state. Then, we use this notion of progress to discretize states, and generate count-based intrinsic rewards using the low-dimensional state space. We show that the combination of LLM-generated progress functions and count-based intrinsic rewards is essential for our performance gains, while alternatives such as generic hash-based counts or using progress directly as a reward function fall short.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2410.09187
- https://arxiv.org/pdf/2410.09187
- OA Status
- green
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4403564379
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4403564379Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2410.09187Digital Object Identifier
- Title
-
Automated Rewards via LLM-Generated Progress FunctionsWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2024Year of publication
- Publication date
-
2024-10-11Full publication date if available
- Authors
-
Vishnu Sarukkai, Brennan Shacklett, Zander Majercik, Kush Bhatia, Christopher Ré, Kayvon FatahalianList of authors in order
- Landing page
-
https://arxiv.org/abs/2410.09187Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2410.09187Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2410.09187Direct OA link when available
- Concepts
-
Computer scienceTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4403564379 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2410.09187 |
| ids.doi | https://doi.org/10.48550/arxiv.2410.09187 |
| ids.openalex | https://openalex.org/W4403564379 |
| fwci | |
| type | preprint |
| title | Automated Rewards via LLM-Generated Progress Functions |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10054 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9466000199317932 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1708 |
| topics[0].subfield.display_name | Hardware and Architecture |
| topics[0].display_name | Parallel Computing and Optimization Techniques |
| topics[1].id | https://openalex.org/T10142 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9369000196456909 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1703 |
| topics[1].subfield.display_name | Computational Theory and Mathematics |
| topics[1].display_name | Formal Methods in Verification |
| topics[2].id | https://openalex.org/T10906 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.9327999949455261 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1702 |
| topics[2].subfield.display_name | Artificial Intelligence |
| topics[2].display_name | AI-based Problem Solving and Planning |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C41008148 |
| concepts[0].level | 0 |
| concepts[0].score | 0.4568648040294647 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[0].display_name | Computer science |
| keywords[0].id | https://openalex.org/keywords/computer-science |
| keywords[0].score | 0.4568648040294647 |
| keywords[0].display_name | Computer science |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2410.09187 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2410.09187 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2410.09187 |
| locations[1].id | doi:10.48550/arxiv.2410.09187 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2410.09187 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5069897950 |
| authorships[0].author.orcid | |
| authorships[0].author.display_name | Vishnu Sarukkai |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Sarukkai, Vishnu |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5070537368 |
| authorships[1].author.orcid | https://orcid.org/0000-0002-2894-1158 |
| authorships[1].author.display_name | Brennan Shacklett |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Shacklett, Brennan |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5114331470 |
| authorships[2].author.orcid | |
| authorships[2].author.display_name | Zander Majercik |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Majercik, Zander |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5114331471 |
| authorships[3].author.orcid | |
| authorships[3].author.display_name | Kush Bhatia |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Bhatia, Kush |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5103852640 |
| authorships[4].author.orcid | |
| authorships[4].author.display_name | Christopher Ré |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Ré, Christopher |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A5037444018 |
| authorships[5].author.orcid | https://orcid.org/0000-0001-8754-0429 |
| authorships[5].author.display_name | Kayvon Fatahalian |
| authorships[5].author_position | last |
| authorships[5].raw_author_name | Fatahalian, Kayvon |
| authorships[5].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2410.09187 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Automated Rewards via LLM-Generated Progress Functions |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10054 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9466000199317932 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1708 |
| primary_topic.subfield.display_name | Hardware and Architecture |
| primary_topic.display_name | Parallel Computing and Optimization Techniques |
| related_works | https://openalex.org/W4391375266, https://openalex.org/W2899084033, https://openalex.org/W2748952813, https://openalex.org/W2390279801, https://openalex.org/W4391913857, https://openalex.org/W2358668433, https://openalex.org/W4396701345, https://openalex.org/W2376932109, https://openalex.org/W2001405890, https://openalex.org/W4396696052 |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2410.09187 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2410.09187 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2410.09187 |
| primary_location.id | pmh:oai:arXiv.org:2410.09187 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2410.09187 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2410.09187 |
| publication_date | 2024-10-11 |
| publication_year | 2024 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 131, 185 |
| abstract_inverted_index.In | 53 |
| abstract_inverted_index.We | 154 |
| abstract_inverted_index.an | 58 |
| abstract_inverted_index.as | 176, 184 |
| abstract_inverted_index.by | 11 |
| abstract_inverted_index.is | 35, 64, 89, 167 |
| abstract_inverted_index.of | 26, 95, 102, 120, 139, 159 |
| abstract_inverted_index.on | 70 |
| abstract_inverted_index.or | 180 |
| abstract_inverted_index.to | 7, 28, 66, 99, 122, 141 |
| abstract_inverted_index.we | 56, 91, 135 |
| abstract_inverted_index.20x | 76 |
| abstract_inverted_index.Our | 86, 107 |
| abstract_inverted_index.and | 115, 144, 163 |
| abstract_inverted_index.for | 50, 169 |
| abstract_inverted_index.key | 87 |
| abstract_inverted_index.our | 170 |
| abstract_inverted_index.the | 5, 45, 71, 82, 93, 100, 111, 116, 150, 157 |
| abstract_inverted_index.use | 136 |
| abstract_inverted_index.LLMs | 121 |
| abstract_inverted_index.This | 33 |
| abstract_inverted_index.able | 65 |
| abstract_inverted_index.code | 117 |
| abstract_inverted_index.each | 51 |
| abstract_inverted_index.fall | 188 |
| abstract_inverted_index.from | 130 |
| abstract_inverted_index.full | 46 |
| abstract_inverted_index.have | 4 |
| abstract_inverted_index.many | 24 |
| abstract_inverted_index.need | 23 |
| abstract_inverted_index.show | 155 |
| abstract_inverted_index.such | 175 |
| abstract_inverted_index.task | 105, 112, 128 |
| abstract_inverted_index.than | 81 |
| abstract_inverted_index.that | 63, 90, 126, 156 |
| abstract_inverted_index.they | 21 |
| abstract_inverted_index.this | 54, 137 |
| abstract_inverted_index.with | 75 |
| abstract_inverted_index.Large | 0 |
| abstract_inverted_index.Then, | 134 |
| abstract_inverted_index.broad | 14 |
| abstract_inverted_index.every | 39 |
| abstract_inverted_index.fewer | 77 |
| abstract_inverted_index.given | 132 |
| abstract_inverted_index.often | 22 |
| abstract_inverted_index.prior | 83 |
| abstract_inverted_index.state | 152 |
| abstract_inverted_index.their | 13 |
| abstract_inverted_index.using | 149, 181 |
| abstract_inverted_index.while | 173 |
| abstract_inverted_index.work. | 85 |
| abstract_inverted_index.(LLMs) | 3 |
| abstract_inverted_index.Models | 2 |
| abstract_inverted_index.across | 17 |
| abstract_inverted_index.author | 123 |
| abstract_inverted_index.costly | 36 |
| abstract_inverted_index.counts | 179 |
| abstract_inverted_index.domain | 15, 113 |
| abstract_inverted_index.gains, | 172 |
| abstract_inverted_index.notion | 138 |
| abstract_inverted_index.paper, | 55 |
| abstract_inverted_index.policy | 47 |
| abstract_inverted_index.reduce | 92 |
| abstract_inverted_index.reward | 9, 31, 41, 60, 78, 186 |
| abstract_inverted_index.short. | 189 |
| abstract_inverted_index.space. | 153 |
| abstract_inverted_index.state. | 133 |
| abstract_inverted_index.tasks. | 19 |
| abstract_inverted_index.because | 37 |
| abstract_inverted_index.generic | 177 |
| abstract_inverted_index.insight | 88 |
| abstract_inverted_index.problem | 94, 101 |
| abstract_inverted_index.process | 34, 49 |
| abstract_inverted_index.produce | 67 |
| abstract_inverted_index.rewards | 98, 148, 166 |
| abstract_inverted_index.sampled | 40 |
| abstract_inverted_index.samples | 80 |
| abstract_inverted_index.states, | 143 |
| abstract_inverted_index.various | 18 |
| abstract_inverted_index.However, | 20 |
| abstract_inverted_index.Language | 1 |
| abstract_inverted_index.automate | 8 |
| abstract_inverted_index.coarsely | 103 |
| abstract_inverted_index.directly | 183 |
| abstract_inverted_index.estimate | 127 |
| abstract_inverted_index.function | 42, 79, 187 |
| abstract_inverted_index.generate | 29, 145 |
| abstract_inverted_index.policies | 69 |
| abstract_inverted_index.progress | 124, 129, 140, 161, 182 |
| abstract_inverted_index.requires | 43 |
| abstract_inverted_index.solution | 109 |
| abstract_inverted_index.two-step | 108 |
| abstract_inverted_index.abilities | 119 |
| abstract_inverted_index.benchmark | 74 |
| abstract_inverted_index.effective | 30 |
| abstract_inverted_index.essential | 168 |
| abstract_inverted_index.framework | 62 |
| abstract_inverted_index.function. | 52 |
| abstract_inverted_index.functions | 125, 162 |
| abstract_inverted_index.intrinsic | 147, 165 |
| abstract_inverted_index.introduce | 57 |
| abstract_inverted_index.knowledge | 16, 114 |
| abstract_inverted_index.leverages | 110 |
| abstract_inverted_index.potential | 6 |
| abstract_inverted_index.progress. | 106 |
| abstract_inverted_index.synthesis | 118 |
| abstract_inverted_index.LLM-driven | 59 |
| abstract_inverted_index.completing | 44 |
| abstract_inverted_index.discretize | 142 |
| abstract_inverted_index.estimating | 104 |
| abstract_inverted_index.evaluating | 38 |
| abstract_inverted_index.functions. | 32 |
| abstract_inverted_index.generating | 96 |
| abstract_inverted_index.generation | 61 |
| abstract_inverted_index.hash-based | 178 |
| abstract_inverted_index.iterations | 25 |
| abstract_inverted_index.leveraging | 12 |
| abstract_inverted_index.Bi-DexHands | 73 |
| abstract_inverted_index.challenging | 72 |
| abstract_inverted_index.combination | 158 |
| abstract_inverted_index.count-based | 146, 164 |
| abstract_inverted_index.engineering | 10 |
| abstract_inverted_index.performance | 171 |
| abstract_inverted_index.alternatives | 174 |
| abstract_inverted_index.optimization | 48 |
| abstract_inverted_index.LLM-generated | 160 |
| abstract_inverted_index.task-specific | 97 |
| abstract_inverted_index.low-dimensional | 151 |
| abstract_inverted_index.trial-and-error | 27 |
| abstract_inverted_index.state-of-the-art | 68, 84 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 6 |
| citation_normalized_percentile |