Think in Games: Learning to Reason in Games via Reinforcement Learning with Large Language Models Article Swipe
YOU?
·
· 2025
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2508.21365
Large language models (LLMs) excel at complex reasoning tasks such as mathematics and coding, yet they frequently struggle with simple interactive tasks that young children perform effortlessly. This discrepancy highlights a critical gap between declarative knowledge (knowing about something) and procedural knowledge (knowing how to do something). Although traditional reinforcement learning (RL) agents can acquire procedural knowledge through environmental interaction, they often operate as black boxes and require substantial training data. In contrast, LLMs possess extensive world knowledge and reasoning capabilities, but are unable to effectively convert this static knowledge into dynamic decision-making in interactive settings. To address this challenge, we propose Think in Games (TiG), a novel framework that empowers LLMs to develop procedural understanding through direct interaction with game environments, while retaining their inherent reasoning and explanatory abilities. Specifically, TiG reformulates RL-based decision-making as a language modeling task: LLMs generate language-guided policies, which are refined iteratively through online reinforcement learning based on environmental feedback. Our experimental results show that TiG successfully bridges the gap between declarative and procedural knowledge, achieving competitive performance with dramatically lower data and computational demands compared to conventional RL methods. Moreover, TiG provides step-by-step natural language explanations for its decisions, greatly improving transparency and interpretability in complex interactive tasks.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2508.21365
- https://arxiv.org/pdf/2508.21365
- OA Status
- green
- OpenAlex ID
- https://openalex.org/W4415989343
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4415989343Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2508.21365Digital Object Identifier
- Title
-
Think in Games: Learning to Reason in Games via Reinforcement Learning with Large Language ModelsWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2025Year of publication
- Publication date
-
2025-08-29Full publication date if available
- Authors
-
Liao Yi, Yu Gu, Yuan Sui, Zining Zhu, Yifan Lu, Guohua Tang, Zhongqian Sun, Wei YangList of authors in order
- Landing page
-
https://arxiv.org/abs/2508.21365Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2508.21365Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2508.21365Direct OA link when available
- Cited by
-
0Total citation count in OpenAlex
Full payload
| id | https://openalex.org/W4415989343 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2508.21365 |
| ids.doi | https://doi.org/10.48550/arxiv.2508.21365 |
| ids.openalex | https://openalex.org/W4415989343 |
| fwci | |
| type | preprint |
| title | Think in Games: Learning to Reason in Games via Reinforcement Learning with Large Language Models |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2508.21365 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2508.21365 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2508.21365 |
| locations[1].id | doi:10.48550/arxiv.2508.21365 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2508.21365 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5100535098 |
| authorships[0].author.orcid | |
| authorships[0].author.display_name | Liao Yi |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Liao, Yi |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5100649179 |
| authorships[1].author.orcid | https://orcid.org/0000-0003-3127-9967 |
| authorships[1].author.display_name | Yu Gu |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Gu, Yu |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5120277920 |
| authorships[2].author.orcid | |
| authorships[2].author.display_name | Yuan Sui |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Sui, Yuan |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5024250477 |
| authorships[3].author.orcid | https://orcid.org/0000-0002-9285-9378 |
| authorships[3].author.display_name | Zining Zhu |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Zhu, Zining |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5038813094 |
| authorships[4].author.orcid | https://orcid.org/0000-0002-0661-3372 |
| authorships[4].author.display_name | Yifan Lu |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Lu, Yifan |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A5100940665 |
| authorships[5].author.orcid | |
| authorships[5].author.display_name | Guohua Tang |
| authorships[5].author_position | middle |
| authorships[5].raw_author_name | Tang, Guohua |
| authorships[5].is_corresponding | False |
| authorships[6].author.id | https://openalex.org/A5038814111 |
| authorships[6].author.orcid | |
| authorships[6].author.display_name | Zhongqian Sun |
| authorships[6].author_position | middle |
| authorships[6].raw_author_name | Sun, Zhongqian |
| authorships[6].is_corresponding | False |
| authorships[7].author.id | https://openalex.org/A5036689637 |
| authorships[7].author.orcid | https://orcid.org/0000-0002-3925-0352 |
| authorships[7].author.display_name | Wei Yang |
| authorships[7].author_position | last |
| authorships[7].raw_author_name | Yang, Wei |
| authorships[7].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2508.21365 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Think in Games: Learning to Reason in Games via Reinforcement Learning with Large Language Models |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-08T23:21:52.890332 |
| primary_topic | |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2508.21365 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2508.21365 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2508.21365 |
| primary_location.id | pmh:oai:arXiv.org:2508.21365 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2508.21365 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2508.21365 |
| publication_date | 2025-08-29 |
| publication_year | 2025 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 30, 106, 136 |
| abstract_inverted_index.In | 71 |
| abstract_inverted_index.RL | 184 |
| abstract_inverted_index.To | 96 |
| abstract_inverted_index.as | 10, 63, 135 |
| abstract_inverted_index.at | 5 |
| abstract_inverted_index.do | 45 |
| abstract_inverted_index.in | 93, 103, 201 |
| abstract_inverted_index.on | 153 |
| abstract_inverted_index.to | 44, 84, 112, 182 |
| abstract_inverted_index.we | 100 |
| abstract_inverted_index.Our | 156 |
| abstract_inverted_index.TiG | 131, 161, 187 |
| abstract_inverted_index.and | 12, 39, 66, 78, 127, 168, 178, 199 |
| abstract_inverted_index.are | 82, 145 |
| abstract_inverted_index.but | 81 |
| abstract_inverted_index.can | 53 |
| abstract_inverted_index.for | 193 |
| abstract_inverted_index.gap | 32, 165 |
| abstract_inverted_index.how | 43 |
| abstract_inverted_index.its | 194 |
| abstract_inverted_index.the | 164 |
| abstract_inverted_index.yet | 14 |
| abstract_inverted_index.(RL) | 51 |
| abstract_inverted_index.LLMs | 73, 111, 140 |
| abstract_inverted_index.This | 27 |
| abstract_inverted_index.data | 177 |
| abstract_inverted_index.game | 120 |
| abstract_inverted_index.into | 90 |
| abstract_inverted_index.show | 159 |
| abstract_inverted_index.such | 9 |
| abstract_inverted_index.that | 22, 109, 160 |
| abstract_inverted_index.they | 15, 60 |
| abstract_inverted_index.this | 87, 98 |
| abstract_inverted_index.with | 18, 119, 174 |
| abstract_inverted_index.Games | 104 |
| abstract_inverted_index.Large | 0 |
| abstract_inverted_index.Think | 102 |
| abstract_inverted_index.about | 37 |
| abstract_inverted_index.based | 152 |
| abstract_inverted_index.black | 64 |
| abstract_inverted_index.boxes | 65 |
| abstract_inverted_index.data. | 70 |
| abstract_inverted_index.excel | 4 |
| abstract_inverted_index.lower | 176 |
| abstract_inverted_index.novel | 107 |
| abstract_inverted_index.often | 61 |
| abstract_inverted_index.task: | 139 |
| abstract_inverted_index.tasks | 8, 21 |
| abstract_inverted_index.their | 124 |
| abstract_inverted_index.which | 144 |
| abstract_inverted_index.while | 122 |
| abstract_inverted_index.world | 76 |
| abstract_inverted_index.young | 23 |
| abstract_inverted_index.(LLMs) | 3 |
| abstract_inverted_index.(TiG), | 105 |
| abstract_inverted_index.agents | 52 |
| abstract_inverted_index.direct | 117 |
| abstract_inverted_index.models | 2 |
| abstract_inverted_index.online | 149 |
| abstract_inverted_index.simple | 19 |
| abstract_inverted_index.static | 88 |
| abstract_inverted_index.tasks. | 204 |
| abstract_inverted_index.unable | 83 |
| abstract_inverted_index.acquire | 54 |
| abstract_inverted_index.address | 97 |
| abstract_inverted_index.between | 33, 166 |
| abstract_inverted_index.bridges | 163 |
| abstract_inverted_index.coding, | 13 |
| abstract_inverted_index.complex | 6, 202 |
| abstract_inverted_index.convert | 86 |
| abstract_inverted_index.demands | 180 |
| abstract_inverted_index.develop | 113 |
| abstract_inverted_index.dynamic | 91 |
| abstract_inverted_index.greatly | 196 |
| abstract_inverted_index.natural | 190 |
| abstract_inverted_index.operate | 62 |
| abstract_inverted_index.perform | 25 |
| abstract_inverted_index.possess | 74 |
| abstract_inverted_index.propose | 101 |
| abstract_inverted_index.refined | 146 |
| abstract_inverted_index.require | 67 |
| abstract_inverted_index.results | 158 |
| abstract_inverted_index.through | 57, 116, 148 |
| abstract_inverted_index.(knowing | 36, 42 |
| abstract_inverted_index.Although | 47 |
| abstract_inverted_index.RL-based | 133 |
| abstract_inverted_index.children | 24 |
| abstract_inverted_index.compared | 181 |
| abstract_inverted_index.critical | 31 |
| abstract_inverted_index.empowers | 110 |
| abstract_inverted_index.generate | 141 |
| abstract_inverted_index.inherent | 125 |
| abstract_inverted_index.language | 1, 137, 191 |
| abstract_inverted_index.learning | 50, 151 |
| abstract_inverted_index.methods. | 185 |
| abstract_inverted_index.modeling | 138 |
| abstract_inverted_index.provides | 188 |
| abstract_inverted_index.struggle | 17 |
| abstract_inverted_index.training | 69 |
| abstract_inverted_index.Moreover, | 186 |
| abstract_inverted_index.achieving | 171 |
| abstract_inverted_index.contrast, | 72 |
| abstract_inverted_index.extensive | 75 |
| abstract_inverted_index.feedback. | 155 |
| abstract_inverted_index.framework | 108 |
| abstract_inverted_index.improving | 197 |
| abstract_inverted_index.knowledge | 35, 41, 56, 77, 89 |
| abstract_inverted_index.policies, | 143 |
| abstract_inverted_index.reasoning | 7, 79, 126 |
| abstract_inverted_index.retaining | 123 |
| abstract_inverted_index.settings. | 95 |
| abstract_inverted_index.abilities. | 129 |
| abstract_inverted_index.challenge, | 99 |
| abstract_inverted_index.decisions, | 195 |
| abstract_inverted_index.frequently | 16 |
| abstract_inverted_index.highlights | 29 |
| abstract_inverted_index.knowledge, | 170 |
| abstract_inverted_index.procedural | 40, 55, 114, 169 |
| abstract_inverted_index.something) | 38 |
| abstract_inverted_index.competitive | 172 |
| abstract_inverted_index.declarative | 34, 167 |
| abstract_inverted_index.discrepancy | 28 |
| abstract_inverted_index.effectively | 85 |
| abstract_inverted_index.explanatory | 128 |
| abstract_inverted_index.interaction | 118 |
| abstract_inverted_index.interactive | 20, 94, 203 |
| abstract_inverted_index.iteratively | 147 |
| abstract_inverted_index.mathematics | 11 |
| abstract_inverted_index.performance | 173 |
| abstract_inverted_index.something). | 46 |
| abstract_inverted_index.substantial | 68 |
| abstract_inverted_index.traditional | 48 |
| abstract_inverted_index.conventional | 183 |
| abstract_inverted_index.dramatically | 175 |
| abstract_inverted_index.experimental | 157 |
| abstract_inverted_index.explanations | 192 |
| abstract_inverted_index.interaction, | 59 |
| abstract_inverted_index.reformulates | 132 |
| abstract_inverted_index.step-by-step | 189 |
| abstract_inverted_index.successfully | 162 |
| abstract_inverted_index.transparency | 198 |
| abstract_inverted_index.Specifically, | 130 |
| abstract_inverted_index.capabilities, | 80 |
| abstract_inverted_index.computational | 179 |
| abstract_inverted_index.effortlessly. | 26 |
| abstract_inverted_index.environmental | 58, 154 |
| abstract_inverted_index.environments, | 121 |
| abstract_inverted_index.reinforcement | 49, 150 |
| abstract_inverted_index.understanding | 115 |
| abstract_inverted_index.decision-making | 92, 134 |
| abstract_inverted_index.language-guided | 142 |
| abstract_inverted_index.interpretability | 200 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 8 |
| citation_normalized_percentile |