Reinforcement Learning with Information-Theoretic Actuation Article Swipe
YOU?
·
· 2021
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2109.15147
Reinforcement Learning formalises an embodied agent's interaction with the environment through observations, rewards and actions. But where do the actions come from? Actions are often considered to represent something external, such as the movement of a limb, a chess piece, or more generally, the output of an actuator. In this work we explore and formalize a contrasting view, namely that actions are best thought of as the output of a sequence of internal choices with respect to an action model. This view is particularly well-suited for leveraging the recent advances in large sequence models as prior knowledge for multi-task reinforcement learning problems. Our main contribution in this work is to show how to augment the standard MDP formalism with a sequential notion of internal action using information-theoretic techniques, and that this leads to self-consistent definitions of both internal and external action value functions.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2109.15147
- https://arxiv.org/pdf/2109.15147
- OA Status
- green
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4315798539
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4315798539Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2109.15147Digital Object Identifier
- Title
-
Reinforcement Learning with Information-Theoretic ActuationWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2021Year of publication
- Publication date
-
2021-09-30Full publication date if available
- Authors
-
Elliot Catt, Marcus Hütter, Joel VenessList of authors in order
- Landing page
-
https://arxiv.org/abs/2109.15147Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2109.15147Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2109.15147Direct OA link when available
- Concepts
-
Reinforcement learning, Computer science, Embodied cognition, Action (physics), Formalism (music), Reinforcement, Artificial intelligence, Sequence (biology), Cognitive science, Human–computer interaction, Psychology, Social psychology, Quantum mechanics, Art, Physics, Visual arts, Musical, Biology, GeneticsTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4315798539 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2109.15147 |
| ids.doi | https://doi.org/10.48550/arxiv.2109.15147 |
| ids.openalex | https://openalex.org/W4315798539 |
| fwci | |
| type | preprint |
| title | Reinforcement Learning with Information-Theoretic Actuation |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10462 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9819999933242798 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1702 |
| topics[0].subfield.display_name | Artificial Intelligence |
| topics[0].display_name | Reinforcement Learning in Robotics |
| topics[1].id | https://openalex.org/T11975 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9724000096321106 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1702 |
| topics[1].subfield.display_name | Artificial Intelligence |
| topics[1].display_name | Evolutionary Algorithms and Applications |
| topics[2].id | https://openalex.org/T10142 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.9577000141143799 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1703 |
| topics[2].subfield.display_name | Computational Theory and Mathematics |
| topics[2].display_name | Formal Methods in Verification |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C97541855 |
| concepts[0].level | 2 |
| concepts[0].score | 0.8609135150909424 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q830687 |
| concepts[0].display_name | Reinforcement learning |
| concepts[1].id | https://openalex.org/C41008148 |
| concepts[1].level | 0 |
| concepts[1].score | 0.6899751424789429 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[1].display_name | Computer science |
| concepts[2].id | https://openalex.org/C100609095 |
| concepts[2].level | 2 |
| concepts[2].score | 0.6633135676383972 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q1335050 |
| concepts[2].display_name | Embodied cognition |
| concepts[3].id | https://openalex.org/C2780791683 |
| concepts[3].level | 2 |
| concepts[3].score | 0.535802960395813 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q846785 |
| concepts[3].display_name | Action (physics) |
| concepts[4].id | https://openalex.org/C73301696 |
| concepts[4].level | 3 |
| concepts[4].score | 0.5047348737716675 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q5469984 |
| concepts[4].display_name | Formalism (music) |
| concepts[5].id | https://openalex.org/C67203356 |
| concepts[5].level | 2 |
| concepts[5].score | 0.5002012252807617 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q1321905 |
| concepts[5].display_name | Reinforcement |
| concepts[6].id | https://openalex.org/C154945302 |
| concepts[6].level | 1 |
| concepts[6].score | 0.49832701683044434 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[6].display_name | Artificial intelligence |
| concepts[7].id | https://openalex.org/C2778112365 |
| concepts[7].level | 2 |
| concepts[7].score | 0.48041653633117676 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q3511065 |
| concepts[7].display_name | Sequence (biology) |
| concepts[8].id | https://openalex.org/C188147891 |
| concepts[8].level | 1 |
| concepts[8].score | 0.46579229831695557 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q147638 |
| concepts[8].display_name | Cognitive science |
| concepts[9].id | https://openalex.org/C107457646 |
| concepts[9].level | 1 |
| concepts[9].score | 0.4208364486694336 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q207434 |
| concepts[9].display_name | Human–computer interaction |
| concepts[10].id | https://openalex.org/C15744967 |
| concepts[10].level | 0 |
| concepts[10].score | 0.13227331638336182 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q9418 |
| concepts[10].display_name | Psychology |
| concepts[11].id | https://openalex.org/C77805123 |
| concepts[11].level | 1 |
| concepts[11].score | 0.0721106231212616 |
| concepts[11].wikidata | https://www.wikidata.org/wiki/Q161272 |
| concepts[11].display_name | Social psychology |
| concepts[12].id | https://openalex.org/C62520636 |
| concepts[12].level | 1 |
| concepts[12].score | 0.0 |
| concepts[12].wikidata | https://www.wikidata.org/wiki/Q944 |
| concepts[12].display_name | Quantum mechanics |
| concepts[13].id | https://openalex.org/C142362112 |
| concepts[13].level | 0 |
| concepts[13].score | 0.0 |
| concepts[13].wikidata | https://www.wikidata.org/wiki/Q735 |
| concepts[13].display_name | Art |
| concepts[14].id | https://openalex.org/C121332964 |
| concepts[14].level | 0 |
| concepts[14].score | 0.0 |
| concepts[14].wikidata | https://www.wikidata.org/wiki/Q413 |
| concepts[14].display_name | Physics |
| concepts[15].id | https://openalex.org/C153349607 |
| concepts[15].level | 1 |
| concepts[15].score | 0.0 |
| concepts[15].wikidata | https://www.wikidata.org/wiki/Q36649 |
| concepts[15].display_name | Visual arts |
| concepts[16].id | https://openalex.org/C558565934 |
| concepts[16].level | 2 |
| concepts[16].score | 0.0 |
| concepts[16].wikidata | https://www.wikidata.org/wiki/Q2743 |
| concepts[16].display_name | Musical |
| concepts[17].id | https://openalex.org/C86803240 |
| concepts[17].level | 0 |
| concepts[17].score | 0.0 |
| concepts[17].wikidata | https://www.wikidata.org/wiki/Q420 |
| concepts[17].display_name | Biology |
| concepts[18].id | https://openalex.org/C54355233 |
| concepts[18].level | 1 |
| concepts[18].score | 0.0 |
| concepts[18].wikidata | https://www.wikidata.org/wiki/Q7162 |
| concepts[18].display_name | Genetics |
| keywords[0].id | https://openalex.org/keywords/reinforcement-learning |
| keywords[0].score | 0.8609135150909424 |
| keywords[0].display_name | Reinforcement learning |
| keywords[1].id | https://openalex.org/keywords/computer-science |
| keywords[1].score | 0.6899751424789429 |
| keywords[1].display_name | Computer science |
| keywords[2].id | https://openalex.org/keywords/embodied-cognition |
| keywords[2].score | 0.6633135676383972 |
| keywords[2].display_name | Embodied cognition |
| keywords[3].id | https://openalex.org/keywords/action |
| keywords[3].score | 0.535802960395813 |
| keywords[3].display_name | Action (physics) |
| keywords[4].id | https://openalex.org/keywords/formalism |
| keywords[4].score | 0.5047348737716675 |
| keywords[4].display_name | Formalism (music) |
| keywords[5].id | https://openalex.org/keywords/reinforcement |
| keywords[5].score | 0.5002012252807617 |
| keywords[5].display_name | Reinforcement |
| keywords[6].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[6].score | 0.49832701683044434 |
| keywords[6].display_name | Artificial intelligence |
| keywords[7].id | https://openalex.org/keywords/sequence |
| keywords[7].score | 0.48041653633117676 |
| keywords[7].display_name | Sequence (biology) |
| keywords[8].id | https://openalex.org/keywords/cognitive-science |
| keywords[8].score | 0.46579229831695557 |
| keywords[8].display_name | Cognitive science |
| keywords[9].id | https://openalex.org/keywords/human–computer-interaction |
| keywords[9].score | 0.4208364486694336 |
| keywords[9].display_name | Human–computer interaction |
| keywords[10].id | https://openalex.org/keywords/psychology |
| keywords[10].score | 0.13227331638336182 |
| keywords[10].display_name | Psychology |
| keywords[11].id | https://openalex.org/keywords/social-psychology |
| keywords[11].score | 0.0721106231212616 |
| keywords[11].display_name | Social psychology |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2109.15147 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2109.15147 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2109.15147 |
| locations[1].id | doi:10.48550/arxiv.2109.15147 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2109.15147 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5020795308 |
| authorships[0].author.orcid | https://orcid.org/0000-0001-9411-927X |
| authorships[0].author.display_name | Elliot Catt |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Catt, Elliot |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5073944062 |
| authorships[1].author.orcid | https://orcid.org/0000-0002-3263-4097 |
| authorships[1].author.display_name | Marcus Hütter |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Hutter, Marcus |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5060709021 |
| authorships[2].author.orcid | |
| authorships[2].author.display_name | Joel Veness |
| authorships[2].author_position | last |
| authorships[2].raw_author_name | Veness, Joel |
| authorships[2].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2109.15147 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Reinforcement Learning with Information-Theoretic Actuation |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10462 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9819999933242798 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1702 |
| primary_topic.subfield.display_name | Artificial Intelligence |
| primary_topic.display_name | Reinforcement Learning in Robotics |
| related_works | https://openalex.org/W2380179524, https://openalex.org/W4283365723, https://openalex.org/W2963001125, https://openalex.org/W2091233881, https://openalex.org/W2352366064, https://openalex.org/W4250820896, https://openalex.org/W2124102101, https://openalex.org/W4250305970, https://openalex.org/W1484550171, https://openalex.org/W2333383158 |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2109.15147 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2109.15147 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2109.15147 |
| primary_location.id | pmh:oai:arXiv.org:2109.15147 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2109.15147 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2109.15147 |
| publication_date | 2021-09-30 |
| publication_year | 2021 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 35, 37, 55, 69, 119 |
| abstract_inverted_index.In | 48 |
| abstract_inverted_index.an | 3, 46, 77 |
| abstract_inverted_index.as | 31, 65, 94 |
| abstract_inverted_index.do | 17 |
| abstract_inverted_index.in | 90, 105 |
| abstract_inverted_index.is | 82, 108 |
| abstract_inverted_index.of | 34, 45, 64, 68, 71, 122, 135 |
| abstract_inverted_index.or | 40 |
| abstract_inverted_index.to | 26, 76, 109, 112, 132 |
| abstract_inverted_index.we | 51 |
| abstract_inverted_index.But | 15 |
| abstract_inverted_index.MDP | 116 |
| abstract_inverted_index.Our | 102 |
| abstract_inverted_index.and | 13, 53, 128, 138 |
| abstract_inverted_index.are | 23, 61 |
| abstract_inverted_index.for | 85, 97 |
| abstract_inverted_index.how | 111 |
| abstract_inverted_index.the | 8, 18, 32, 43, 66, 87, 114 |
| abstract_inverted_index.This | 80 |
| abstract_inverted_index.best | 62 |
| abstract_inverted_index.both | 136 |
| abstract_inverted_index.come | 20 |
| abstract_inverted_index.main | 103 |
| abstract_inverted_index.more | 41 |
| abstract_inverted_index.show | 110 |
| abstract_inverted_index.such | 30 |
| abstract_inverted_index.that | 59, 129 |
| abstract_inverted_index.this | 49, 106, 130 |
| abstract_inverted_index.view | 81 |
| abstract_inverted_index.with | 7, 74, 118 |
| abstract_inverted_index.work | 50, 107 |
| abstract_inverted_index.chess | 38 |
| abstract_inverted_index.from? | 21 |
| abstract_inverted_index.large | 91 |
| abstract_inverted_index.leads | 131 |
| abstract_inverted_index.limb, | 36 |
| abstract_inverted_index.often | 24 |
| abstract_inverted_index.prior | 95 |
| abstract_inverted_index.using | 125 |
| abstract_inverted_index.value | 141 |
| abstract_inverted_index.view, | 57 |
| abstract_inverted_index.where | 16 |
| abstract_inverted_index.action | 78, 124, 140 |
| abstract_inverted_index.model. | 79 |
| abstract_inverted_index.models | 93 |
| abstract_inverted_index.namely | 58 |
| abstract_inverted_index.notion | 121 |
| abstract_inverted_index.output | 44, 67 |
| abstract_inverted_index.piece, | 39 |
| abstract_inverted_index.recent | 88 |
| abstract_inverted_index.Actions | 22 |
| abstract_inverted_index.actions | 19, 60 |
| abstract_inverted_index.agent's | 5 |
| abstract_inverted_index.augment | 113 |
| abstract_inverted_index.choices | 73 |
| abstract_inverted_index.explore | 52 |
| abstract_inverted_index.respect | 75 |
| abstract_inverted_index.rewards | 12 |
| abstract_inverted_index.thought | 63 |
| abstract_inverted_index.through | 10 |
| abstract_inverted_index.Learning | 1 |
| abstract_inverted_index.actions. | 14 |
| abstract_inverted_index.advances | 89 |
| abstract_inverted_index.embodied | 4 |
| abstract_inverted_index.external | 139 |
| abstract_inverted_index.internal | 72, 123, 137 |
| abstract_inverted_index.learning | 100 |
| abstract_inverted_index.movement | 33 |
| abstract_inverted_index.sequence | 70, 92 |
| abstract_inverted_index.standard | 115 |
| abstract_inverted_index.actuator. | 47 |
| abstract_inverted_index.external, | 29 |
| abstract_inverted_index.formalism | 117 |
| abstract_inverted_index.formalize | 54 |
| abstract_inverted_index.knowledge | 96 |
| abstract_inverted_index.problems. | 101 |
| abstract_inverted_index.represent | 27 |
| abstract_inverted_index.something | 28 |
| abstract_inverted_index.considered | 25 |
| abstract_inverted_index.formalises | 2 |
| abstract_inverted_index.functions. | 142 |
| abstract_inverted_index.generally, | 42 |
| abstract_inverted_index.leveraging | 86 |
| abstract_inverted_index.multi-task | 98 |
| abstract_inverted_index.sequential | 120 |
| abstract_inverted_index.contrasting | 56 |
| abstract_inverted_index.definitions | 134 |
| abstract_inverted_index.environment | 9 |
| abstract_inverted_index.interaction | 6 |
| abstract_inverted_index.techniques, | 127 |
| abstract_inverted_index.well-suited | 84 |
| abstract_inverted_index.contribution | 104 |
| abstract_inverted_index.particularly | 83 |
| abstract_inverted_index.Reinforcement | 0 |
| abstract_inverted_index.observations, | 11 |
| abstract_inverted_index.reinforcement | 99 |
| abstract_inverted_index.self-consistent | 133 |
| abstract_inverted_index.information-theoretic | 126 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 3 |
| citation_normalized_percentile |