Sample-Efficient Reinforcement Learning through Transfer and Architectural Priors Article Swipe
YOU?
·
· 2018
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.1801.02268
Recent work in deep reinforcement learning has allowed algorithms to learn complex tasks such as Atari 2600 games just from the reward provided by the game, but these algorithms presently require millions of training steps in order to learn, making them approximately five orders of magnitude slower than humans. One reason for this is that humans build robust shared representations that are applicable to collections of problems, making it much easier to assimilate new variants. This paper first introduces the idea of automatically-generated game sets to aid in transfer learning research, and then demonstrates the utility of shared representations by showing that models can substantially benefit from the incorporation of relevant architectural priors. This technique affords a remarkable 50x positive transfer on a toy problem-set.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- https://arxiv.org/pdf/1801.02268
- OA Status
- green
- Cited By
- 11
- References
- 2
- Related Works
- 20
- OpenAlex ID
- https://openalex.org/W2782656435
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W2782656435Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.1801.02268Digital Object Identifier
- Title
-
Sample-Efficient Reinforcement Learning through Transfer and Architectural PriorsWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2018Year of publication
- Publication date
-
2018-01-07Full publication date if available
- Authors
-
Benjamin Spector, Serge BelongieList of authors in order
- Landing page
-
https://arxiv.org/pdf/1801.02268Publisher landing page
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/1801.02268Direct OA link when available
- Concepts
-
Reinforcement learning, Computer science, Prior probability, Transfer of learning, Artificial intelligence, Set (abstract data type), Sample (material), Machine learning, Sample complexity, Transfer (computing), Bayesian probability, Programming language, Chromatography, Chemistry, Parallel computingTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
11Total citation count in OpenAlex
- Citations by year (recent)
-
2021: 2, 2020: 5, 2019: 4Per-year citation counts (last 5 years)
- References (count)
-
2Number of works referenced by this work
- Related works (count)
-
20Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W2782656435 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.1801.02268 |
| ids.doi | https://doi.org/10.48550/arxiv.1801.02268 |
| ids.mag | 2782656435 |
| ids.openalex | https://openalex.org/W2782656435 |
| fwci | |
| type | preprint |
| title | Sample-Efficient Reinforcement Learning through Transfer and Architectural Priors |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10462 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9998999834060669 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1702 |
| topics[0].subfield.display_name | Artificial Intelligence |
| topics[0].display_name | Reinforcement Learning in Robotics |
| topics[1].id | https://openalex.org/T11975 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9969000220298767 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1702 |
| topics[1].subfield.display_name | Artificial Intelligence |
| topics[1].display_name | Evolutionary Algorithms and Applications |
| topics[2].id | https://openalex.org/T11574 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.9916999936103821 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1702 |
| topics[2].subfield.display_name | Artificial Intelligence |
| topics[2].display_name | Artificial Intelligence in Games |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C97541855 |
| concepts[0].level | 2 |
| concepts[0].score | 0.8312558531761169 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q830687 |
| concepts[0].display_name | Reinforcement learning |
| concepts[1].id | https://openalex.org/C41008148 |
| concepts[1].level | 0 |
| concepts[1].score | 0.7848619222640991 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[1].display_name | Computer science |
| concepts[2].id | https://openalex.org/C177769412 |
| concepts[2].level | 3 |
| concepts[2].score | 0.7768038511276245 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q278090 |
| concepts[2].display_name | Prior probability |
| concepts[3].id | https://openalex.org/C150899416 |
| concepts[3].level | 2 |
| concepts[3].score | 0.674079179763794 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q1820378 |
| concepts[3].display_name | Transfer of learning |
| concepts[4].id | https://openalex.org/C154945302 |
| concepts[4].level | 1 |
| concepts[4].score | 0.6316269040107727 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[4].display_name | Artificial intelligence |
| concepts[5].id | https://openalex.org/C177264268 |
| concepts[5].level | 2 |
| concepts[5].score | 0.6277016401290894 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q1514741 |
| concepts[5].display_name | Set (abstract data type) |
| concepts[6].id | https://openalex.org/C198531522 |
| concepts[6].level | 2 |
| concepts[6].score | 0.6051440834999084 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q485146 |
| concepts[6].display_name | Sample (material) |
| concepts[7].id | https://openalex.org/C119857082 |
| concepts[7].level | 1 |
| concepts[7].score | 0.5283917784690857 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q2539 |
| concepts[7].display_name | Machine learning |
| concepts[8].id | https://openalex.org/C2778445095 |
| concepts[8].level | 2 |
| concepts[8].score | 0.4825855493545532 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q18354077 |
| concepts[8].display_name | Sample complexity |
| concepts[9].id | https://openalex.org/C2776175482 |
| concepts[9].level | 2 |
| concepts[9].score | 0.4485432803630829 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q1195816 |
| concepts[9].display_name | Transfer (computing) |
| concepts[10].id | https://openalex.org/C107673813 |
| concepts[10].level | 2 |
| concepts[10].score | 0.10039982199668884 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q812534 |
| concepts[10].display_name | Bayesian probability |
| concepts[11].id | https://openalex.org/C199360897 |
| concepts[11].level | 1 |
| concepts[11].score | 0.0 |
| concepts[11].wikidata | https://www.wikidata.org/wiki/Q9143 |
| concepts[11].display_name | Programming language |
| concepts[12].id | https://openalex.org/C43617362 |
| concepts[12].level | 1 |
| concepts[12].score | 0.0 |
| concepts[12].wikidata | https://www.wikidata.org/wiki/Q170050 |
| concepts[12].display_name | Chromatography |
| concepts[13].id | https://openalex.org/C185592680 |
| concepts[13].level | 0 |
| concepts[13].score | 0.0 |
| concepts[13].wikidata | https://www.wikidata.org/wiki/Q2329 |
| concepts[13].display_name | Chemistry |
| concepts[14].id | https://openalex.org/C173608175 |
| concepts[14].level | 1 |
| concepts[14].score | 0.0 |
| concepts[14].wikidata | https://www.wikidata.org/wiki/Q232661 |
| concepts[14].display_name | Parallel computing |
| keywords[0].id | https://openalex.org/keywords/reinforcement-learning |
| keywords[0].score | 0.8312558531761169 |
| keywords[0].display_name | Reinforcement learning |
| keywords[1].id | https://openalex.org/keywords/computer-science |
| keywords[1].score | 0.7848619222640991 |
| keywords[1].display_name | Computer science |
| keywords[2].id | https://openalex.org/keywords/prior-probability |
| keywords[2].score | 0.7768038511276245 |
| keywords[2].display_name | Prior probability |
| keywords[3].id | https://openalex.org/keywords/transfer-of-learning |
| keywords[3].score | 0.674079179763794 |
| keywords[3].display_name | Transfer of learning |
| keywords[4].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[4].score | 0.6316269040107727 |
| keywords[4].display_name | Artificial intelligence |
| keywords[5].id | https://openalex.org/keywords/set |
| keywords[5].score | 0.6277016401290894 |
| keywords[5].display_name | Set (abstract data type) |
| keywords[6].id | https://openalex.org/keywords/sample |
| keywords[6].score | 0.6051440834999084 |
| keywords[6].display_name | Sample (material) |
| keywords[7].id | https://openalex.org/keywords/machine-learning |
| keywords[7].score | 0.5283917784690857 |
| keywords[7].display_name | Machine learning |
| keywords[8].id | https://openalex.org/keywords/sample-complexity |
| keywords[8].score | 0.4825855493545532 |
| keywords[8].display_name | Sample complexity |
| keywords[9].id | https://openalex.org/keywords/transfer |
| keywords[9].score | 0.4485432803630829 |
| keywords[9].display_name | Transfer (computing) |
| keywords[10].id | https://openalex.org/keywords/bayesian-probability |
| keywords[10].score | 0.10039982199668884 |
| keywords[10].display_name | Bayesian probability |
| language | en |
| locations[0].id | mag:2782656435 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | |
| locations[0].version | submittedVersion |
| locations[0].raw_type | |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | arXiv (Cornell University) |
| locations[0].landing_page_url | https://arxiv.org/pdf/1801.02268 |
| locations[1].id | doi:10.48550/arxiv.1801.02268 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.1801.02268 |
| indexed_in | datacite |
| authorships[0].author.id | https://openalex.org/A5004675499 |
| authorships[0].author.orcid | https://orcid.org/0000-0003-0468-5986 |
| authorships[0].author.display_name | Benjamin Spector |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Benjamin Spector |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5018609918 |
| authorships[1].author.orcid | https://orcid.org/0000-0002-0388-5217 |
| authorships[1].author.display_name | Serge Belongie |
| authorships[1].author_position | last |
| authorships[1].raw_author_name | Serge J. Belongie |
| authorships[1].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/1801.02268 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Sample-Efficient Reinforcement Learning through Transfer and Architectural Priors |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10462 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9998999834060669 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1702 |
| primary_topic.subfield.display_name | Artificial Intelligence |
| primary_topic.display_name | Reinforcement Learning in Robotics |
| related_works | https://openalex.org/W2121863487, https://openalex.org/W2145339207, https://openalex.org/W1757796397, https://openalex.org/W2257979135, https://openalex.org/W2097381042, https://openalex.org/W3095548673, https://openalex.org/W2521274174, https://openalex.org/W2902567911, https://openalex.org/W2344013593, https://openalex.org/W2999490157, https://openalex.org/W2294805292, https://openalex.org/W3131034247, https://openalex.org/W3025133903, https://openalex.org/W3008295237, https://openalex.org/W1882226547, https://openalex.org/W2790924949, https://openalex.org/W1587813652, https://openalex.org/W2741926431, https://openalex.org/W2950197980, https://openalex.org/W3016913676 |
| cited_by_count | 11 |
| counts_by_year[0].year | 2021 |
| counts_by_year[0].cited_by_count | 2 |
| counts_by_year[1].year | 2020 |
| counts_by_year[1].cited_by_count | 5 |
| counts_by_year[2].year | 2019 |
| counts_by_year[2].cited_by_count | 4 |
| locations_count | 2 |
| best_oa_location.id | mag:2782656435 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | arXiv (Cornell University) |
| best_oa_location.landing_page_url | https://arxiv.org/pdf/1801.02268 |
| primary_location.id | mag:2782656435 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | |
| primary_location.version | submittedVersion |
| primary_location.raw_type | |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | arXiv (Cornell University) |
| primary_location.landing_page_url | https://arxiv.org/pdf/1801.02268 |
| publication_date | 2018-01-07 |
| publication_year | 2018 |
| referenced_works | https://openalex.org/W2583761661, https://openalex.org/W1757796397 |
| referenced_works_count | 2 |
| abstract_inverted_index.a | 116, 122 |
| abstract_inverted_index.as | 14 |
| abstract_inverted_index.by | 23, 99 |
| abstract_inverted_index.in | 2, 35, 87 |
| abstract_inverted_index.is | 53 |
| abstract_inverted_index.it | 68 |
| abstract_inverted_index.of | 32, 44, 65, 81, 96, 109 |
| abstract_inverted_index.on | 121 |
| abstract_inverted_index.to | 9, 37, 63, 71, 85 |
| abstract_inverted_index.50x | 118 |
| abstract_inverted_index.One | 49 |
| abstract_inverted_index.aid | 86 |
| abstract_inverted_index.and | 91 |
| abstract_inverted_index.are | 61 |
| abstract_inverted_index.but | 26 |
| abstract_inverted_index.can | 103 |
| abstract_inverted_index.for | 51 |
| abstract_inverted_index.has | 6 |
| abstract_inverted_index.new | 73 |
| abstract_inverted_index.the | 20, 24, 79, 94, 107 |
| abstract_inverted_index.toy | 123 |
| abstract_inverted_index.2600 | 16 |
| abstract_inverted_index.This | 75, 113 |
| abstract_inverted_index.deep | 3 |
| abstract_inverted_index.five | 42 |
| abstract_inverted_index.from | 19, 106 |
| abstract_inverted_index.game | 83 |
| abstract_inverted_index.idea | 80 |
| abstract_inverted_index.just | 18 |
| abstract_inverted_index.much | 69 |
| abstract_inverted_index.sets | 84 |
| abstract_inverted_index.such | 13 |
| abstract_inverted_index.than | 47 |
| abstract_inverted_index.that | 54, 60, 101 |
| abstract_inverted_index.them | 40 |
| abstract_inverted_index.then | 92 |
| abstract_inverted_index.this | 52 |
| abstract_inverted_index.work | 1 |
| abstract_inverted_index.Atari | 15 |
| abstract_inverted_index.build | 56 |
| abstract_inverted_index.first | 77 |
| abstract_inverted_index.game, | 25 |
| abstract_inverted_index.games | 17 |
| abstract_inverted_index.learn | 10 |
| abstract_inverted_index.order | 36 |
| abstract_inverted_index.paper | 76 |
| abstract_inverted_index.steps | 34 |
| abstract_inverted_index.tasks | 12 |
| abstract_inverted_index.these | 27 |
| abstract_inverted_index.Recent | 0 |
| abstract_inverted_index.easier | 70 |
| abstract_inverted_index.humans | 55 |
| abstract_inverted_index.learn, | 38 |
| abstract_inverted_index.making | 39, 67 |
| abstract_inverted_index.models | 102 |
| abstract_inverted_index.orders | 43 |
| abstract_inverted_index.reason | 50 |
| abstract_inverted_index.reward | 21 |
| abstract_inverted_index.robust | 57 |
| abstract_inverted_index.shared | 58, 97 |
| abstract_inverted_index.slower | 46 |
| abstract_inverted_index.affords | 115 |
| abstract_inverted_index.allowed | 7 |
| abstract_inverted_index.benefit | 105 |
| abstract_inverted_index.complex | 11 |
| abstract_inverted_index.humans. | 48 |
| abstract_inverted_index.priors. | 112 |
| abstract_inverted_index.require | 30 |
| abstract_inverted_index.showing | 100 |
| abstract_inverted_index.utility | 95 |
| abstract_inverted_index.learning | 5, 89 |
| abstract_inverted_index.millions | 31 |
| abstract_inverted_index.positive | 119 |
| abstract_inverted_index.provided | 22 |
| abstract_inverted_index.relevant | 110 |
| abstract_inverted_index.training | 33 |
| abstract_inverted_index.transfer | 88, 120 |
| abstract_inverted_index.magnitude | 45 |
| abstract_inverted_index.presently | 29 |
| abstract_inverted_index.problems, | 66 |
| abstract_inverted_index.research, | 90 |
| abstract_inverted_index.technique | 114 |
| abstract_inverted_index.variants. | 74 |
| abstract_inverted_index.algorithms | 8, 28 |
| abstract_inverted_index.applicable | 62 |
| abstract_inverted_index.assimilate | 72 |
| abstract_inverted_index.introduces | 78 |
| abstract_inverted_index.remarkable | 117 |
| abstract_inverted_index.collections | 64 |
| abstract_inverted_index.demonstrates | 93 |
| abstract_inverted_index.problem-set. | 124 |
| abstract_inverted_index.approximately | 41 |
| abstract_inverted_index.architectural | 111 |
| abstract_inverted_index.incorporation | 108 |
| abstract_inverted_index.reinforcement | 4 |
| abstract_inverted_index.substantially | 104 |
| abstract_inverted_index.representations | 59, 98 |
| abstract_inverted_index.automatically-generated | 82 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 2 |
| citation_normalized_percentile |