Self-Supervised Exploration via Temporal Inconsistency in Reinforcement Learning Article Swipe
YOU?
·
· 2022
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2208.11361
Under sparse extrinsic reward settings, reinforcement learning has remained challenging, despite surging interests in this field. Previous attempts suggest that intrinsic reward can alleviate the issue caused by sparsity. In this article, we present a novel intrinsic reward that is inspired by human learning, as humans evaluate curiosity by comparing current observations with historical knowledge. Our method involves training a self-supervised prediction model, saving snapshots of the model parameters, and using nuclear norm to evaluate the temporal inconsistency between the predictions of different snapshots as intrinsic rewards. We also propose a variational weighting mechanism to assign weight to different snapshots in an adaptive manner. Our experimental results on various benchmark environments demonstrate the efficacy of our method, which outperforms other intrinsic reward-based methods without additional training costs and with higher noise tolerance. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2208.11361
- https://arxiv.org/pdf/2208.11361
- OA Status
- green
- Cited By
- 1
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4293169624
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4293169624Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2208.11361Digital Object Identifier
- Title
-
Self-Supervised Exploration via Temporal Inconsistency in Reinforcement LearningWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2022Year of publication
- Publication date
-
2022-08-24Full publication date if available
- Authors
-
Zijian Gao, Kele Xu, HengXing Cai, Yuanzhao Zhai, Dawei Feng, Bo Ding, Xinjun Mao, Huaimin WangList of authors in order
- Landing page
-
https://arxiv.org/abs/2208.11361Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2208.11361Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2208.11361Direct OA link when available
- Concepts
-
Reinforcement learning, Computer science, Curiosity, Artificial intelligence, Benchmark (surveying), Machine learning, Weighting, Notice, Psychology, Social psychology, Radiology, Geography, Medicine, Geodesy, Law, Political scienceTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
1Total citation count in OpenAlex
- Citations by year (recent)
-
2023: 1Per-year citation counts (last 5 years)
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4293169624 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2208.11361 |
| ids.doi | https://doi.org/10.48550/arxiv.2208.11361 |
| ids.openalex | https://openalex.org/W4293169624 |
| fwci | |
| type | preprint |
| title | Self-Supervised Exploration via Temporal Inconsistency in Reinforcement Learning |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10581 |
| topics[0].field.id | https://openalex.org/fields/28 |
| topics[0].field.display_name | Neuroscience |
| topics[0].score | 0.9083999991416931 |
| topics[0].domain.id | https://openalex.org/domains/1 |
| topics[0].domain.display_name | Life Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/2805 |
| topics[0].subfield.display_name | Cognitive Neuroscience |
| topics[0].display_name | Neural dynamics and brain function |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C97541855 |
| concepts[0].level | 2 |
| concepts[0].score | 0.7579803466796875 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q830687 |
| concepts[0].display_name | Reinforcement learning |
| concepts[1].id | https://openalex.org/C41008148 |
| concepts[1].level | 0 |
| concepts[1].score | 0.6877408027648926 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[1].display_name | Computer science |
| concepts[2].id | https://openalex.org/C33435437 |
| concepts[2].level | 2 |
| concepts[2].score | 0.6742708683013916 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q366791 |
| concepts[2].display_name | Curiosity |
| concepts[3].id | https://openalex.org/C154945302 |
| concepts[3].level | 1 |
| concepts[3].score | 0.6253281831741333 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[3].display_name | Artificial intelligence |
| concepts[4].id | https://openalex.org/C185798385 |
| concepts[4].level | 2 |
| concepts[4].score | 0.5974135994911194 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q1161707 |
| concepts[4].display_name | Benchmark (surveying) |
| concepts[5].id | https://openalex.org/C119857082 |
| concepts[5].level | 1 |
| concepts[5].score | 0.5912570357322693 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q2539 |
| concepts[5].display_name | Machine learning |
| concepts[6].id | https://openalex.org/C183115368 |
| concepts[6].level | 2 |
| concepts[6].score | 0.4432836174964905 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q856577 |
| concepts[6].display_name | Weighting |
| concepts[7].id | https://openalex.org/C2779913896 |
| concepts[7].level | 2 |
| concepts[7].score | 0.4205784499645233 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q7063001 |
| concepts[7].display_name | Notice |
| concepts[8].id | https://openalex.org/C15744967 |
| concepts[8].level | 0 |
| concepts[8].score | 0.22424134612083435 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q9418 |
| concepts[8].display_name | Psychology |
| concepts[9].id | https://openalex.org/C77805123 |
| concepts[9].level | 1 |
| concepts[9].score | 0.1004587709903717 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q161272 |
| concepts[9].display_name | Social psychology |
| concepts[10].id | https://openalex.org/C126838900 |
| concepts[10].level | 1 |
| concepts[10].score | 0.0 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q77604 |
| concepts[10].display_name | Radiology |
| concepts[11].id | https://openalex.org/C205649164 |
| concepts[11].level | 0 |
| concepts[11].score | 0.0 |
| concepts[11].wikidata | https://www.wikidata.org/wiki/Q1071 |
| concepts[11].display_name | Geography |
| concepts[12].id | https://openalex.org/C71924100 |
| concepts[12].level | 0 |
| concepts[12].score | 0.0 |
| concepts[12].wikidata | https://www.wikidata.org/wiki/Q11190 |
| concepts[12].display_name | Medicine |
| concepts[13].id | https://openalex.org/C13280743 |
| concepts[13].level | 1 |
| concepts[13].score | 0.0 |
| concepts[13].wikidata | https://www.wikidata.org/wiki/Q131089 |
| concepts[13].display_name | Geodesy |
| concepts[14].id | https://openalex.org/C199539241 |
| concepts[14].level | 1 |
| concepts[14].score | 0.0 |
| concepts[14].wikidata | https://www.wikidata.org/wiki/Q7748 |
| concepts[14].display_name | Law |
| concepts[15].id | https://openalex.org/C17744445 |
| concepts[15].level | 0 |
| concepts[15].score | 0.0 |
| concepts[15].wikidata | https://www.wikidata.org/wiki/Q36442 |
| concepts[15].display_name | Political science |
| keywords[0].id | https://openalex.org/keywords/reinforcement-learning |
| keywords[0].score | 0.7579803466796875 |
| keywords[0].display_name | Reinforcement learning |
| keywords[1].id | https://openalex.org/keywords/computer-science |
| keywords[1].score | 0.6877408027648926 |
| keywords[1].display_name | Computer science |
| keywords[2].id | https://openalex.org/keywords/curiosity |
| keywords[2].score | 0.6742708683013916 |
| keywords[2].display_name | Curiosity |
| keywords[3].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[3].score | 0.6253281831741333 |
| keywords[3].display_name | Artificial intelligence |
| keywords[4].id | https://openalex.org/keywords/benchmark |
| keywords[4].score | 0.5974135994911194 |
| keywords[4].display_name | Benchmark (surveying) |
| keywords[5].id | https://openalex.org/keywords/machine-learning |
| keywords[5].score | 0.5912570357322693 |
| keywords[5].display_name | Machine learning |
| keywords[6].id | https://openalex.org/keywords/weighting |
| keywords[6].score | 0.4432836174964905 |
| keywords[6].display_name | Weighting |
| keywords[7].id | https://openalex.org/keywords/notice |
| keywords[7].score | 0.4205784499645233 |
| keywords[7].display_name | Notice |
| keywords[8].id | https://openalex.org/keywords/psychology |
| keywords[8].score | 0.22424134612083435 |
| keywords[8].display_name | Psychology |
| keywords[9].id | https://openalex.org/keywords/social-psychology |
| keywords[9].score | 0.1004587709903717 |
| keywords[9].display_name | Social psychology |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2208.11361 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2208.11361 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2208.11361 |
| locations[1].id | doi:10.48550/arxiv.2208.11361 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2208.11361 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5053023740 |
| authorships[0].author.orcid | https://orcid.org/0000-0003-3864-738X |
| authorships[0].author.display_name | Zijian Gao |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Gao, Zijian |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5013340793 |
| authorships[1].author.orcid | https://orcid.org/0000-0001-5997-5169 |
| authorships[1].author.display_name | Kele Xu |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Xu, Kele |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5022710885 |
| authorships[2].author.orcid | |
| authorships[2].author.display_name | HengXing Cai |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Cai, HengXing |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5073132517 |
| authorships[3].author.orcid | https://orcid.org/0000-0003-1385-0074 |
| authorships[3].author.display_name | Yuanzhao Zhai |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Zhai, Yuanzhao |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5039795290 |
| authorships[4].author.orcid | https://orcid.org/0000-0002-7587-8905 |
| authorships[4].author.display_name | Dawei Feng |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Feng, Dawei |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A5101888603 |
| authorships[5].author.orcid | https://orcid.org/0000-0002-9977-7332 |
| authorships[5].author.display_name | Bo Ding |
| authorships[5].author_position | middle |
| authorships[5].raw_author_name | Ding, Bo |
| authorships[5].is_corresponding | False |
| authorships[6].author.id | https://openalex.org/A5083124500 |
| authorships[6].author.orcid | https://orcid.org/0000-0001-6003-5748 |
| authorships[6].author.display_name | Xinjun Mao |
| authorships[6].author_position | middle |
| authorships[6].raw_author_name | Mao, XinJun |
| authorships[6].is_corresponding | False |
| authorships[7].author.id | https://openalex.org/A5101522100 |
| authorships[7].author.orcid | https://orcid.org/0000-0002-3245-1901 |
| authorships[7].author.display_name | Huaimin Wang |
| authorships[7].author_position | last |
| authorships[7].raw_author_name | Wang, Huaimin |
| authorships[7].is_corresponding | False |
| has_content.pdf | True |
| has_content.grobid_xml | True |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2208.11361 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2022-08-27T00:00:00 |
| display_name | Self-Supervised Exploration via Temporal Inconsistency in Reinforcement Learning |
| has_fulltext | True |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10581 |
| primary_topic.field.id | https://openalex.org/fields/28 |
| primary_topic.field.display_name | Neuroscience |
| primary_topic.score | 0.9083999991416931 |
| primary_topic.domain.id | https://openalex.org/domains/1 |
| primary_topic.domain.display_name | Life Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/2805 |
| primary_topic.subfield.display_name | Cognitive Neuroscience |
| primary_topic.display_name | Neural dynamics and brain function |
| related_works | https://openalex.org/W3094054656, https://openalex.org/W4285676344, https://openalex.org/W2123270665, https://openalex.org/W4382584175, https://openalex.org/W2060310955, https://openalex.org/W2284924956, https://openalex.org/W3043413210, https://openalex.org/W2613740288, https://openalex.org/W4252460700, https://openalex.org/W4383268304 |
| cited_by_count | 1 |
| counts_by_year[0].year | 2023 |
| counts_by_year[0].cited_by_count | 1 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2208.11361 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2208.11361 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2208.11361 |
| primary_location.id | pmh:oai:arXiv.org:2208.11361 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2208.11361 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2208.11361 |
| publication_date | 2022-08-24 |
| publication_year | 2022 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 34, 59, 90 |
| abstract_inverted_index.In | 29 |
| abstract_inverted_index.We | 87 |
| abstract_inverted_index.an | 101 |
| abstract_inverted_index.as | 44, 84 |
| abstract_inverted_index.be | 145, 156 |
| abstract_inverted_index.by | 27, 41, 48 |
| abstract_inverted_index.in | 13, 100 |
| abstract_inverted_index.is | 39 |
| abstract_inverted_index.no | 154 |
| abstract_inverted_index.of | 65, 81, 114 |
| abstract_inverted_index.on | 107 |
| abstract_inverted_index.to | 73, 94, 97, 137 |
| abstract_inverted_index.we | 32 |
| abstract_inverted_index.Our | 55, 104 |
| abstract_inverted_index.and | 69, 127 |
| abstract_inverted_index.can | 22 |
| abstract_inverted_index.for | 140 |
| abstract_inverted_index.has | 7, 134 |
| abstract_inverted_index.may | 144, 153 |
| abstract_inverted_index.our | 115 |
| abstract_inverted_index.the | 24, 66, 75, 79, 112, 138 |
| abstract_inverted_index.IEEE | 139 |
| abstract_inverted_index.This | 132 |
| abstract_inverted_index.also | 88 |
| abstract_inverted_index.been | 135 |
| abstract_inverted_index.norm | 72 |
| abstract_inverted_index.that | 19, 38 |
| abstract_inverted_index.this | 14, 30, 151 |
| abstract_inverted_index.with | 52, 128 |
| abstract_inverted_index.work | 133 |
| abstract_inverted_index.Under | 0 |
| abstract_inverted_index.after | 149 |
| abstract_inverted_index.costs | 126 |
| abstract_inverted_index.human | 42 |
| abstract_inverted_index.issue | 25 |
| abstract_inverted_index.model | 67 |
| abstract_inverted_index.noise | 130 |
| abstract_inverted_index.novel | 35 |
| abstract_inverted_index.other | 119 |
| abstract_inverted_index.using | 70 |
| abstract_inverted_index.which | 117, 150 |
| abstract_inverted_index.assign | 95 |
| abstract_inverted_index.caused | 26 |
| abstract_inverted_index.field. | 15 |
| abstract_inverted_index.higher | 129 |
| abstract_inverted_index.humans | 45 |
| abstract_inverted_index.longer | 155 |
| abstract_inverted_index.method | 56 |
| abstract_inverted_index.model, | 62 |
| abstract_inverted_index.reward | 3, 21, 37 |
| abstract_inverted_index.saving | 63 |
| abstract_inverted_index.sparse | 1 |
| abstract_inverted_index.weight | 96 |
| abstract_inverted_index.between | 78 |
| abstract_inverted_index.current | 50 |
| abstract_inverted_index.despite | 10 |
| abstract_inverted_index.manner. | 103 |
| abstract_inverted_index.method, | 116 |
| abstract_inverted_index.methods | 122 |
| abstract_inverted_index.notice, | 148 |
| abstract_inverted_index.nuclear | 71 |
| abstract_inverted_index.present | 33 |
| abstract_inverted_index.propose | 89 |
| abstract_inverted_index.results | 106 |
| abstract_inverted_index.suggest | 18 |
| abstract_inverted_index.surging | 11 |
| abstract_inverted_index.various | 108 |
| abstract_inverted_index.version | 152 |
| abstract_inverted_index.without | 123, 147 |
| abstract_inverted_index.Previous | 16 |
| abstract_inverted_index.adaptive | 102 |
| abstract_inverted_index.article, | 31 |
| abstract_inverted_index.attempts | 17 |
| abstract_inverted_index.efficacy | 113 |
| abstract_inverted_index.evaluate | 46, 74 |
| abstract_inverted_index.inspired | 40 |
| abstract_inverted_index.involves | 57 |
| abstract_inverted_index.learning | 6 |
| abstract_inverted_index.possible | 141 |
| abstract_inverted_index.remained | 8 |
| abstract_inverted_index.rewards. | 86 |
| abstract_inverted_index.temporal | 76 |
| abstract_inverted_index.training | 58, 125 |
| abstract_inverted_index.Copyright | 143 |
| abstract_inverted_index.alleviate | 23 |
| abstract_inverted_index.benchmark | 109 |
| abstract_inverted_index.comparing | 49 |
| abstract_inverted_index.curiosity | 47 |
| abstract_inverted_index.different | 82, 98 |
| abstract_inverted_index.extrinsic | 2 |
| abstract_inverted_index.interests | 12 |
| abstract_inverted_index.intrinsic | 20, 36, 85, 120 |
| abstract_inverted_index.learning, | 43 |
| abstract_inverted_index.mechanism | 93 |
| abstract_inverted_index.settings, | 4 |
| abstract_inverted_index.snapshots | 64, 83, 99 |
| abstract_inverted_index.sparsity. | 28 |
| abstract_inverted_index.submitted | 136 |
| abstract_inverted_index.weighting | 92 |
| abstract_inverted_index.additional | 124 |
| abstract_inverted_index.historical | 53 |
| abstract_inverted_index.knowledge. | 54 |
| abstract_inverted_index.prediction | 61 |
| abstract_inverted_index.tolerance. | 131 |
| abstract_inverted_index.accessible. | 157 |
| abstract_inverted_index.demonstrate | 111 |
| abstract_inverted_index.outperforms | 118 |
| abstract_inverted_index.parameters, | 68 |
| abstract_inverted_index.predictions | 80 |
| abstract_inverted_index.transferred | 146 |
| abstract_inverted_index.variational | 91 |
| abstract_inverted_index.challenging, | 9 |
| abstract_inverted_index.environments | 110 |
| abstract_inverted_index.experimental | 105 |
| abstract_inverted_index.observations | 51 |
| abstract_inverted_index.publication. | 142 |
| abstract_inverted_index.reward-based | 121 |
| abstract_inverted_index.inconsistency | 77 |
| abstract_inverted_index.reinforcement | 5 |
| abstract_inverted_index.self-supervised | 60 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 8 |
| citation_normalized_percentile |