Differentiable Architecture Search for Reinforcement Learning Article Swipe
YOU?
·
· 2021
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2106.02229
In this paper, we investigate the fundamental question: To what extent are gradient-based neural architecture search (NAS) techniques applicable to RL? Using the original DARTS as a convenient baseline, we discover that the discrete architectures found can achieve up to 250% performance compared to manual architecture designs on both discrete and continuous action space environments across off-policy and on-policy RL algorithms, at only 3x more computation time. Furthermore, through numerous ablation studies, we systematically verify that not only does DARTS correctly upweight operations during its supernet phrase, but also gradually improves resulting discrete cells up to 30x more efficiently than random search, suggesting DARTS is surprisingly an effective tool for improving architectures in RL.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2106.02229
- https://arxiv.org/pdf/2106.02229
- OA Status
- green
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4287125796
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4287125796Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2106.02229Digital Object Identifier
- Title
-
Differentiable Architecture Search for Reinforcement LearningWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2021Year of publication
- Publication date
-
2021-06-04Full publication date if available
- Authors
-
Yingjie Miao, Xingyou Song, John D. Co-Reyes, Daiyi Peng, Summer Yue, Eugene Brevdo, Aleksandra FaustList of authors in order
- Landing page
-
https://arxiv.org/abs/2106.02229Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2106.02229Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2106.02229Direct OA link when available
- Concepts
-
Reinforcement learning, Computer science, Architecture, Computation, Differentiable function, Phrase, Artificial intelligence, Machine learning, Theoretical computer science, Algorithm, Mathematics, Art, Visual arts, Mathematical analysisTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4287125796 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2106.02229 |
| ids.doi | https://doi.org/10.48550/arxiv.2106.02229 |
| ids.openalex | https://openalex.org/W4287125796 |
| fwci | 0.0 |
| type | preprint |
| title | Differentiable Architecture Search for Reinforcement Learning |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10462 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9829999804496765 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1702 |
| topics[0].subfield.display_name | Artificial Intelligence |
| topics[0].display_name | Reinforcement Learning in Robotics |
| topics[1].id | https://openalex.org/T11689 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9387999773025513 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1702 |
| topics[1].subfield.display_name | Artificial Intelligence |
| topics[1].display_name | Adversarial Robustness in Machine Learning |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C97541855 |
| concepts[0].level | 2 |
| concepts[0].score | 0.8586028814315796 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q830687 |
| concepts[0].display_name | Reinforcement learning |
| concepts[1].id | https://openalex.org/C41008148 |
| concepts[1].level | 0 |
| concepts[1].score | 0.7378206253051758 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[1].display_name | Computer science |
| concepts[2].id | https://openalex.org/C123657996 |
| concepts[2].level | 2 |
| concepts[2].score | 0.7096469402313232 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q12271 |
| concepts[2].display_name | Architecture |
| concepts[3].id | https://openalex.org/C45374587 |
| concepts[3].level | 2 |
| concepts[3].score | 0.5863551497459412 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q12525525 |
| concepts[3].display_name | Computation |
| concepts[4].id | https://openalex.org/C202615002 |
| concepts[4].level | 2 |
| concepts[4].score | 0.5551665425300598 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q783507 |
| concepts[4].display_name | Differentiable function |
| concepts[5].id | https://openalex.org/C2776224158 |
| concepts[5].level | 2 |
| concepts[5].score | 0.5002026557922363 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q187931 |
| concepts[5].display_name | Phrase |
| concepts[6].id | https://openalex.org/C154945302 |
| concepts[6].level | 1 |
| concepts[6].score | 0.4703790247440338 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[6].display_name | Artificial intelligence |
| concepts[7].id | https://openalex.org/C119857082 |
| concepts[7].level | 1 |
| concepts[7].score | 0.34140321612358093 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q2539 |
| concepts[7].display_name | Machine learning |
| concepts[8].id | https://openalex.org/C80444323 |
| concepts[8].level | 1 |
| concepts[8].score | 0.3376169204711914 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q2878974 |
| concepts[8].display_name | Theoretical computer science |
| concepts[9].id | https://openalex.org/C11413529 |
| concepts[9].level | 1 |
| concepts[9].score | 0.21968910098075867 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q8366 |
| concepts[9].display_name | Algorithm |
| concepts[10].id | https://openalex.org/C33923547 |
| concepts[10].level | 0 |
| concepts[10].score | 0.1271568238735199 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q395 |
| concepts[10].display_name | Mathematics |
| concepts[11].id | https://openalex.org/C142362112 |
| concepts[11].level | 0 |
| concepts[11].score | 0.0 |
| concepts[11].wikidata | https://www.wikidata.org/wiki/Q735 |
| concepts[11].display_name | Art |
| concepts[12].id | https://openalex.org/C153349607 |
| concepts[12].level | 1 |
| concepts[12].score | 0.0 |
| concepts[12].wikidata | https://www.wikidata.org/wiki/Q36649 |
| concepts[12].display_name | Visual arts |
| concepts[13].id | https://openalex.org/C134306372 |
| concepts[13].level | 1 |
| concepts[13].score | 0.0 |
| concepts[13].wikidata | https://www.wikidata.org/wiki/Q7754 |
| concepts[13].display_name | Mathematical analysis |
| keywords[0].id | https://openalex.org/keywords/reinforcement-learning |
| keywords[0].score | 0.8586028814315796 |
| keywords[0].display_name | Reinforcement learning |
| keywords[1].id | https://openalex.org/keywords/computer-science |
| keywords[1].score | 0.7378206253051758 |
| keywords[1].display_name | Computer science |
| keywords[2].id | https://openalex.org/keywords/architecture |
| keywords[2].score | 0.7096469402313232 |
| keywords[2].display_name | Architecture |
| keywords[3].id | https://openalex.org/keywords/computation |
| keywords[3].score | 0.5863551497459412 |
| keywords[3].display_name | Computation |
| keywords[4].id | https://openalex.org/keywords/differentiable-function |
| keywords[4].score | 0.5551665425300598 |
| keywords[4].display_name | Differentiable function |
| keywords[5].id | https://openalex.org/keywords/phrase |
| keywords[5].score | 0.5002026557922363 |
| keywords[5].display_name | Phrase |
| keywords[6].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[6].score | 0.4703790247440338 |
| keywords[6].display_name | Artificial intelligence |
| keywords[7].id | https://openalex.org/keywords/machine-learning |
| keywords[7].score | 0.34140321612358093 |
| keywords[7].display_name | Machine learning |
| keywords[8].id | https://openalex.org/keywords/theoretical-computer-science |
| keywords[8].score | 0.3376169204711914 |
| keywords[8].display_name | Theoretical computer science |
| keywords[9].id | https://openalex.org/keywords/algorithm |
| keywords[9].score | 0.21968910098075867 |
| keywords[9].display_name | Algorithm |
| keywords[10].id | https://openalex.org/keywords/mathematics |
| keywords[10].score | 0.1271568238735199 |
| keywords[10].display_name | Mathematics |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2106.02229 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2106.02229 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2106.02229 |
| locations[1].id | doi:10.48550/arxiv.2106.02229 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2106.02229 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5039869395 |
| authorships[0].author.orcid | https://orcid.org/0000-0001-6908-0182 |
| authorships[0].author.display_name | Yingjie Miao |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Miao, Yingjie |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5081034298 |
| authorships[1].author.orcid | https://orcid.org/0000-0001-6055-3174 |
| authorships[1].author.display_name | Xingyou Song |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Song, Xingyou |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5007992087 |
| authorships[2].author.orcid | |
| authorships[2].author.display_name | John D. Co-Reyes |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Co-Reyes, John D. |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5049565925 |
| authorships[3].author.orcid | |
| authorships[3].author.display_name | Daiyi Peng |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Peng, Daiyi |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5050610019 |
| authorships[4].author.orcid | |
| authorships[4].author.display_name | Summer Yue |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Yue, Summer |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A5047736127 |
| authorships[5].author.orcid | https://orcid.org/0009-0005-7965-3534 |
| authorships[5].author.display_name | Eugene Brevdo |
| authorships[5].author_position | middle |
| authorships[5].raw_author_name | Brevdo, Eugene |
| authorships[5].is_corresponding | False |
| authorships[6].author.id | https://openalex.org/A5002971435 |
| authorships[6].author.orcid | https://orcid.org/0000-0002-3268-8685 |
| authorships[6].author.display_name | Aleksandra Faust |
| authorships[6].author_position | last |
| authorships[6].raw_author_name | Faust, Aleksandra |
| authorships[6].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2106.02229 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Differentiable Architecture Search for Reinforcement Learning |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10462 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9829999804496765 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1702 |
| primary_topic.subfield.display_name | Artificial Intelligence |
| primary_topic.display_name | Reinforcement Learning in Robotics |
| related_works | https://openalex.org/W4285277090, https://openalex.org/W4327738859, https://openalex.org/W2039546652, https://openalex.org/W2348722996, https://openalex.org/W2334570605, https://openalex.org/W3181683615, https://openalex.org/W4286826125, https://openalex.org/W1633485514, https://openalex.org/W1604739066, https://openalex.org/W2115878407 |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2106.02229 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2106.02229 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2106.02229 |
| primary_location.id | pmh:oai:arXiv.org:2106.02229 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2106.02229 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2106.02229 |
| publication_date | 2021-06-04 |
| publication_year | 2021 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 26 |
| abstract_inverted_index.3x | 63 |
| abstract_inverted_index.In | 0 |
| abstract_inverted_index.RL | 59 |
| abstract_inverted_index.To | 8 |
| abstract_inverted_index.an | 106 |
| abstract_inverted_index.as | 25 |
| abstract_inverted_index.at | 61 |
| abstract_inverted_index.in | 112 |
| abstract_inverted_index.is | 104 |
| abstract_inverted_index.on | 47 |
| abstract_inverted_index.to | 19, 39, 43, 95 |
| abstract_inverted_index.up | 38, 94 |
| abstract_inverted_index.we | 3, 29, 72 |
| abstract_inverted_index.30x | 96 |
| abstract_inverted_index.RL. | 113 |
| abstract_inverted_index.RL? | 20 |
| abstract_inverted_index.and | 50, 57 |
| abstract_inverted_index.are | 11 |
| abstract_inverted_index.but | 87 |
| abstract_inverted_index.can | 36 |
| abstract_inverted_index.for | 109 |
| abstract_inverted_index.its | 84 |
| abstract_inverted_index.not | 76 |
| abstract_inverted_index.the | 5, 22, 32 |
| abstract_inverted_index.250% | 40 |
| abstract_inverted_index.also | 88 |
| abstract_inverted_index.both | 48 |
| abstract_inverted_index.does | 78 |
| abstract_inverted_index.more | 64, 97 |
| abstract_inverted_index.only | 62, 77 |
| abstract_inverted_index.than | 99 |
| abstract_inverted_index.that | 31, 75 |
| abstract_inverted_index.this | 1 |
| abstract_inverted_index.tool | 108 |
| abstract_inverted_index.what | 9 |
| abstract_inverted_index.(NAS) | 16 |
| abstract_inverted_index.DARTS | 24, 79, 103 |
| abstract_inverted_index.Using | 21 |
| abstract_inverted_index.cells | 93 |
| abstract_inverted_index.found | 35 |
| abstract_inverted_index.space | 53 |
| abstract_inverted_index.time. | 66 |
| abstract_inverted_index.across | 55 |
| abstract_inverted_index.action | 52 |
| abstract_inverted_index.during | 83 |
| abstract_inverted_index.extent | 10 |
| abstract_inverted_index.manual | 44 |
| abstract_inverted_index.neural | 13 |
| abstract_inverted_index.paper, | 2 |
| abstract_inverted_index.random | 100 |
| abstract_inverted_index.search | 15 |
| abstract_inverted_index.verify | 74 |
| abstract_inverted_index.achieve | 37 |
| abstract_inverted_index.designs | 46 |
| abstract_inverted_index.phrase, | 86 |
| abstract_inverted_index.search, | 101 |
| abstract_inverted_index.through | 68 |
| abstract_inverted_index.ablation | 70 |
| abstract_inverted_index.compared | 42 |
| abstract_inverted_index.discover | 30 |
| abstract_inverted_index.discrete | 33, 49, 92 |
| abstract_inverted_index.improves | 90 |
| abstract_inverted_index.numerous | 69 |
| abstract_inverted_index.original | 23 |
| abstract_inverted_index.studies, | 71 |
| abstract_inverted_index.supernet | 85 |
| abstract_inverted_index.upweight | 81 |
| abstract_inverted_index.baseline, | 28 |
| abstract_inverted_index.correctly | 80 |
| abstract_inverted_index.effective | 107 |
| abstract_inverted_index.gradually | 89 |
| abstract_inverted_index.improving | 110 |
| abstract_inverted_index.on-policy | 58 |
| abstract_inverted_index.question: | 7 |
| abstract_inverted_index.resulting | 91 |
| abstract_inverted_index.applicable | 18 |
| abstract_inverted_index.continuous | 51 |
| abstract_inverted_index.convenient | 27 |
| abstract_inverted_index.off-policy | 56 |
| abstract_inverted_index.operations | 82 |
| abstract_inverted_index.suggesting | 102 |
| abstract_inverted_index.techniques | 17 |
| abstract_inverted_index.algorithms, | 60 |
| abstract_inverted_index.computation | 65 |
| abstract_inverted_index.efficiently | 98 |
| abstract_inverted_index.fundamental | 6 |
| abstract_inverted_index.investigate | 4 |
| abstract_inverted_index.performance | 41 |
| abstract_inverted_index.Furthermore, | 67 |
| abstract_inverted_index.architecture | 14, 45 |
| abstract_inverted_index.environments | 54 |
| abstract_inverted_index.surprisingly | 105 |
| abstract_inverted_index.architectures | 34, 111 |
| abstract_inverted_index.gradient-based | 12 |
| abstract_inverted_index.systematically | 73 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 7 |
| sustainable_development_goals[0].id | https://metadata.un.org/sdg/9 |
| sustainable_development_goals[0].score | 0.4699999988079071 |
| sustainable_development_goals[0].display_name | Industry, innovation and infrastructure |
| citation_normalized_percentile |