Differentiable Architecture Search for Reinforcement Learning Article Swipe

PDF

Yingjie Miao , Xingyou Song , John D. Co-Reyes , Daiyi Peng , Summer Yue , Eugene Brevdo , Aleksandra Faust ·

YOU? · · 2021 · Open Access · · DOI: https://doi.org/10.48550/arxiv.2106.02229

In this paper, we investigate the fundamental question: To what extent are gradient-based neural architecture search (NAS) techniques applicable to RL? Using the original DARTS as a convenient baseline, we discover that the discrete architectures found can achieve up to 250% performance compared to manual architecture designs on both discrete and continuous action space environments across off-policy and on-policy RL algorithms, at only 3x more computation time. Furthermore, through numerous ablation studies, we systematically verify that not only does DARTS correctly upweight operations during its supernet phrase, but also gradually improves resulting discrete cells up to 30x more efficiently than random search, suggesting DARTS is surprisingly an effective tool for improving architectures in RL.

Related Topics

Reinforcement Learning

Computer Science

Architecture

Artificial Intelligence

Machine Learning

Theoretical Computer Science

Mathematical Analysis

Concepts

Reinforcement learning Computer science Architecture Computation Differentiable function Phrase Artificial intelligence Machine learning Theoretical computer science Algorithm Mathematics Art Visual arts Mathematical analysis

Metadata

Type: preprint
Language: en
Landing Page: http://arxiv.org/abs/2106.02229
PDF: https://arxiv.org/pdf/2106.02229
OA Status: green
Related Works: 10
OpenAlex ID: https://openalex.org/W4287125796

All OpenAlex metadata

Raw OpenAlex JSON

OpenAlex ID: https://openalex.org/W4287125796

Canonical identifier for this work in OpenAlex
DOI: https://doi.org/10.48550/arxiv.2106.02229

Digital Object Identifier
Title: Differentiable Architecture Search for Reinforcement Learning

Work title
Type: preprint

OpenAlex work type
Language: en

Primary language
Publication year: 2021

Year of publication
Publication date: 2021-06-04

Full publication date if available
Authors: Yingjie Miao, Xingyou Song, John D. Co-Reyes, Daiyi Peng, Summer Yue, Eugene Brevdo, Aleksandra Faust

List of authors in order
Landing page: https://arxiv.org/abs/2106.02229

Publisher landing page
PDF URL: https://arxiv.org/pdf/2106.02229

Direct link to full text PDF
Open access: Yes

Whether a free full text is available
OA status: green

Open access status per OpenAlex
OA URL: https://arxiv.org/pdf/2106.02229

Direct OA link when available
Concepts: Reinforcement learning, Computer science, Architecture, Computation, Differentiable function, Phrase, Artificial intelligence, Machine learning, Theoretical computer science, Algorithm, Mathematics, Art, Visual arts, Mathematical analysis

Top concepts (fields/topics) attached by OpenAlex
Cited by: 0

Total citation count in OpenAlex
Related works (count): 10

Other works algorithmically related by OpenAlex

Full payload

id	https://openalex.org/W4287125796
doi	https://doi.org/10.48550/arxiv.2106.02229
ids.doi	https://doi.org/10.48550/arxiv.2106.02229
ids.openalex	https://openalex.org/W4287125796
fwci	0.0
type	preprint
title	Differentiable Architecture Search for Reinforcement Learning
biblio.issue
biblio.volume
biblio.last_page
biblio.first_page
topics[0].id	https://openalex.org/T10462
topics[0].field.id	https://openalex.org/fields/17
topics[0].field.display_name	Computer Science
topics[0].score	0.9829999804496765
topics[0].domain.id	https://openalex.org/domains/3
topics[0].domain.display_name	Physical Sciences
topics[0].subfield.id	https://openalex.org/subfields/1702
topics[0].subfield.display_name	Artificial Intelligence
topics[0].display_name	Reinforcement Learning in Robotics
topics[1].id	https://openalex.org/T11689
topics[1].field.id	https://openalex.org/fields/17
topics[1].field.display_name	Computer Science
topics[1].score	0.9387999773025513
topics[1].domain.id	https://openalex.org/domains/3
topics[1].domain.display_name	Physical Sciences
topics[1].subfield.id	https://openalex.org/subfields/1702
topics[1].subfield.display_name	Artificial Intelligence
topics[1].display_name	Adversarial Robustness in Machine Learning
is_xpac	False
apc_list
apc_paid
concepts[0].id	https://openalex.org/C97541855
concepts[0].level	2
concepts[0].score	0.8586028814315796
concepts[0].wikidata	https://www.wikidata.org/wiki/Q830687
concepts[0].display_name	Reinforcement learning
concepts[1].id	https://openalex.org/C41008148
concepts[1].level	0
concepts[1].score	0.7378206253051758
concepts[1].wikidata	https://www.wikidata.org/wiki/Q21198
concepts[1].display_name	Computer science
concepts[2].id	https://openalex.org/C123657996
concepts[2].level	2
concepts[2].score	0.7096469402313232
concepts[2].wikidata	https://www.wikidata.org/wiki/Q12271
concepts[2].display_name	Architecture
concepts[3].id	https://openalex.org/C45374587
concepts[3].level	2
concepts[3].score	0.5863551497459412
concepts[3].wikidata	https://www.wikidata.org/wiki/Q12525525
concepts[3].display_name	Computation
concepts[4].id	https://openalex.org/C202615002
concepts[4].level	2
concepts[4].score	0.5551665425300598
concepts[4].wikidata	https://www.wikidata.org/wiki/Q783507
concepts[4].display_name	Differentiable function
concepts[5].id	https://openalex.org/C2776224158
concepts[5].level	2
concepts[5].score	0.5002026557922363
concepts[5].wikidata	https://www.wikidata.org/wiki/Q187931
concepts[5].display_name	Phrase
concepts[6].id	https://openalex.org/C154945302
concepts[6].level	1
concepts[6].score	0.4703790247440338
concepts[6].wikidata	https://www.wikidata.org/wiki/Q11660
concepts[6].display_name	Artificial intelligence
concepts[7].id	https://openalex.org/C119857082
concepts[7].level	1
concepts[7].score	0.34140321612358093
concepts[7].wikidata	https://www.wikidata.org/wiki/Q2539
concepts[7].display_name	Machine learning
concepts[8].id	https://openalex.org/C80444323
concepts[8].level	1
concepts[8].score	0.3376169204711914
concepts[8].wikidata	https://www.wikidata.org/wiki/Q2878974
concepts[8].display_name	Theoretical computer science
concepts[9].id	https://openalex.org/C11413529
concepts[9].level	1
concepts[9].score	0.21968910098075867
concepts[9].wikidata	https://www.wikidata.org/wiki/Q8366
concepts[9].display_name	Algorithm
concepts[10].id	https://openalex.org/C33923547
concepts[10].level	0
concepts[10].score	0.1271568238735199
concepts[10].wikidata	https://www.wikidata.org/wiki/Q395
concepts[10].display_name	Mathematics
concepts[11].id	https://openalex.org/C142362112
concepts[11].level	0
concepts[11].score	0.0
concepts[11].wikidata	https://www.wikidata.org/wiki/Q735
concepts[11].display_name	Art
concepts[12].id	https://openalex.org/C153349607
concepts[12].level	1
concepts[12].score	0.0
concepts[12].wikidata	https://www.wikidata.org/wiki/Q36649
concepts[12].display_name	Visual arts
concepts[13].id	https://openalex.org/C134306372
concepts[13].level	1
concepts[13].score	0.0
concepts[13].wikidata	https://www.wikidata.org/wiki/Q7754
concepts[13].display_name	Mathematical analysis
keywords[0].id	https://openalex.org/keywords/reinforcement-learning
keywords[0].score	0.8586028814315796
keywords[0].display_name	Reinforcement learning
keywords[1].id	https://openalex.org/keywords/computer-science
keywords[1].score	0.7378206253051758
keywords[1].display_name	Computer science
keywords[2].id	https://openalex.org/keywords/architecture
keywords[2].score	0.7096469402313232
keywords[2].display_name	Architecture
keywords[3].id	https://openalex.org/keywords/computation
keywords[3].score	0.5863551497459412
keywords[3].display_name	Computation
keywords[4].id	https://openalex.org/keywords/differentiable-function
keywords[4].score	0.5551665425300598
keywords[4].display_name	Differentiable function
keywords[5].id	https://openalex.org/keywords/phrase
keywords[5].score	0.5002026557922363
keywords[5].display_name	Phrase
keywords[6].id	https://openalex.org/keywords/artificial-intelligence
keywords[6].score	0.4703790247440338
keywords[6].display_name	Artificial intelligence
keywords[7].id	https://openalex.org/keywords/machine-learning
keywords[7].score	0.34140321612358093
keywords[7].display_name	Machine learning
keywords[8].id	https://openalex.org/keywords/theoretical-computer-science
keywords[8].score	0.3376169204711914
keywords[8].display_name	Theoretical computer science
keywords[9].id	https://openalex.org/keywords/algorithm
keywords[9].score	0.21968910098075867
keywords[9].display_name	Algorithm
keywords[10].id	https://openalex.org/keywords/mathematics
keywords[10].score	0.1271568238735199
keywords[10].display_name	Mathematics
language	en
locations[0].id	pmh:oai:arXiv.org:2106.02229
locations[0].is_oa	True
locations[0].source.id	https://openalex.org/S4306400194
locations[0].source.issn
locations[0].source.type	repository
locations[0].source.is_oa	True
locations[0].source.issn_l
locations[0].source.is_core	False
locations[0].source.is_in_doaj	False
locations[0].source.display_name	arXiv (Cornell University)
locations[0].source.host_organization	https://openalex.org/I205783295
locations[0].source.host_organization_name	Cornell University
locations[0].source.host_organization_lineage	https://openalex.org/I205783295
locations[0].license
locations[0].pdf_url	https://arxiv.org/pdf/2106.02229
locations[0].version	submittedVersion
locations[0].raw_type	text
locations[0].license_id
locations[0].is_accepted	False
locations[0].is_published	False
locations[0].raw_source_name
locations[0].landing_page_url	http://arxiv.org/abs/2106.02229
locations[1].id	doi:10.48550/arxiv.2106.02229
locations[1].is_oa	True
locations[1].source.id	https://openalex.org/S4306400194
locations[1].source.issn
locations[1].source.type	repository
locations[1].source.is_oa	True
locations[1].source.issn_l
locations[1].source.is_core	False
locations[1].source.is_in_doaj	False
locations[1].source.display_name	arXiv (Cornell University)
locations[1].source.host_organization	https://openalex.org/I205783295
locations[1].source.host_organization_name	Cornell University
locations[1].source.host_organization_lineage	https://openalex.org/I205783295
locations[1].license
locations[1].pdf_url
locations[1].version
locations[1].raw_type	article
locations[1].license_id
locations[1].is_accepted	False
locations[1].is_published
locations[1].raw_source_name
locations[1].landing_page_url	https://doi.org/10.48550/arxiv.2106.02229
indexed_in	arxiv, datacite
authorships[0].author.id	https://openalex.org/A5039869395
authorships[0].author.orcid	https://orcid.org/0000-0001-6908-0182
authorships[0].author.display_name	Yingjie Miao
authorships[0].author_position	first
authorships[0].raw_author_name	Miao, Yingjie
authorships[0].is_corresponding	False
authorships[1].author.id	https://openalex.org/A5081034298
authorships[1].author.orcid	https://orcid.org/0000-0001-6055-3174
authorships[1].author.display_name	Xingyou Song
authorships[1].author_position	middle
authorships[1].raw_author_name	Song, Xingyou
authorships[1].is_corresponding	False
authorships[2].author.id	https://openalex.org/A5007992087
authorships[2].author.orcid
authorships[2].author.display_name	John D. Co-Reyes
authorships[2].author_position	middle
authorships[2].raw_author_name	Co-Reyes, John D.
authorships[2].is_corresponding	False
authorships[3].author.id	https://openalex.org/A5049565925
authorships[3].author.orcid
authorships[3].author.display_name	Daiyi Peng
authorships[3].author_position	middle
authorships[3].raw_author_name	Peng, Daiyi
authorships[3].is_corresponding	False
authorships[4].author.id	https://openalex.org/A5050610019
authorships[4].author.orcid
authorships[4].author.display_name	Summer Yue
authorships[4].author_position	middle
authorships[4].raw_author_name	Yue, Summer
authorships[4].is_corresponding	False
authorships[5].author.id	https://openalex.org/A5047736127
authorships[5].author.orcid	https://orcid.org/0009-0005-7965-3534
authorships[5].author.display_name	Eugene Brevdo
authorships[5].author_position	middle
authorships[5].raw_author_name	Brevdo, Eugene
authorships[5].is_corresponding	False
authorships[6].author.id	https://openalex.org/A5002971435
authorships[6].author.orcid	https://orcid.org/0000-0002-3268-8685
authorships[6].author.display_name	Aleksandra Faust
authorships[6].author_position	last
authorships[6].raw_author_name	Faust, Aleksandra
authorships[6].is_corresponding	False
has_content.pdf	False
has_content.grobid_xml	False
is_paratext	False
open_access.is_oa	True
open_access.oa_url	https://arxiv.org/pdf/2106.02229
open_access.oa_status	green
open_access.any_repository_has_fulltext	False
created_date	2025-10-10T00:00:00
display_name	Differentiable Architecture Search for Reinforcement Learning
has_fulltext	False
is_retracted	False
updated_date	2025-11-06T06:51:31.235846
primary_topic.id	https://openalex.org/T10462
primary_topic.field.id	https://openalex.org/fields/17
primary_topic.field.display_name	Computer Science
primary_topic.score	0.9829999804496765
primary_topic.domain.id	https://openalex.org/domains/3
primary_topic.domain.display_name	Physical Sciences
primary_topic.subfield.id	https://openalex.org/subfields/1702
primary_topic.subfield.display_name	Artificial Intelligence
primary_topic.display_name	Reinforcement Learning in Robotics
related_works	https://openalex.org/W4285277090, https://openalex.org/W4327738859, https://openalex.org/W2039546652, https://openalex.org/W2348722996, https://openalex.org/W2334570605, https://openalex.org/W3181683615, https://openalex.org/W4286826125, https://openalex.org/W1633485514, https://openalex.org/W1604739066, https://openalex.org/W2115878407
cited_by_count	0
locations_count	2
best_oa_location.id	pmh:oai:arXiv.org:2106.02229
best_oa_location.is_oa	True
best_oa_location.source.id	https://openalex.org/S4306400194
best_oa_location.source.issn
best_oa_location.source.type	repository
best_oa_location.source.is_oa	True
best_oa_location.source.issn_l
best_oa_location.source.is_core	False
best_oa_location.source.is_in_doaj	False
best_oa_location.source.display_name	arXiv (Cornell University)
best_oa_location.source.host_organization	https://openalex.org/I205783295
best_oa_location.source.host_organization_name	Cornell University
best_oa_location.source.host_organization_lineage	https://openalex.org/I205783295
best_oa_location.license
best_oa_location.pdf_url	https://arxiv.org/pdf/2106.02229
best_oa_location.version	submittedVersion
best_oa_location.raw_type	text
best_oa_location.license_id
best_oa_location.is_accepted	False
best_oa_location.is_published	False
best_oa_location.raw_source_name
best_oa_location.landing_page_url	http://arxiv.org/abs/2106.02229
primary_location.id	pmh:oai:arXiv.org:2106.02229
primary_location.is_oa	True
primary_location.source.id	https://openalex.org/S4306400194
primary_location.source.issn
primary_location.source.type	repository
primary_location.source.is_oa	True
primary_location.source.issn_l
primary_location.source.is_core	False
primary_location.source.is_in_doaj	False
primary_location.source.display_name	arXiv (Cornell University)
primary_location.source.host_organization	https://openalex.org/I205783295
primary_location.source.host_organization_name	Cornell University
primary_location.source.host_organization_lineage	https://openalex.org/I205783295
primary_location.license
primary_location.pdf_url	https://arxiv.org/pdf/2106.02229
primary_location.version	submittedVersion
primary_location.raw_type	text
primary_location.license_id
primary_location.is_accepted	False
primary_location.is_published	False
primary_location.raw_source_name
primary_location.landing_page_url	http://arxiv.org/abs/2106.02229
publication_date	2021-06-04
publication_year	2021
referenced_works_count	0
abstract_inverted_index.a	26
abstract_inverted_index.3x	63
abstract_inverted_index.In	0
abstract_inverted_index.RL	59
abstract_inverted_index.To	8
abstract_inverted_index.an	106
abstract_inverted_index.as	25
abstract_inverted_index.at	61
abstract_inverted_index.in	112
abstract_inverted_index.is	104
abstract_inverted_index.on	47
abstract_inverted_index.to	19, 39, 43, 95
abstract_inverted_index.up	38, 94
abstract_inverted_index.we	3, 29, 72
abstract_inverted_index.30x	96
abstract_inverted_index.RL.	113
abstract_inverted_index.RL?	20
abstract_inverted_index.and	50, 57
abstract_inverted_index.are	11
abstract_inverted_index.but	87
abstract_inverted_index.can	36
abstract_inverted_index.for	109
abstract_inverted_index.its	84
abstract_inverted_index.not	76
abstract_inverted_index.the	5, 22, 32
abstract_inverted_index.250%	40
abstract_inverted_index.also	88
abstract_inverted_index.both	48
abstract_inverted_index.does	78
abstract_inverted_index.more	64, 97
abstract_inverted_index.only	62, 77
abstract_inverted_index.than	99
abstract_inverted_index.that	31, 75
abstract_inverted_index.this	1
abstract_inverted_index.tool	108
abstract_inverted_index.what	9
abstract_inverted_index.(NAS)	16
abstract_inverted_index.DARTS	24, 79, 103
abstract_inverted_index.Using	21
abstract_inverted_index.cells	93
abstract_inverted_index.found	35
abstract_inverted_index.space	53
abstract_inverted_index.time.	66
abstract_inverted_index.across	55
abstract_inverted_index.action	52
abstract_inverted_index.during	83
abstract_inverted_index.extent	10
abstract_inverted_index.manual	44
abstract_inverted_index.neural	13
abstract_inverted_index.paper,	2
abstract_inverted_index.random	100
abstract_inverted_index.search	15
abstract_inverted_index.verify	74
abstract_inverted_index.achieve	37
abstract_inverted_index.designs	46
abstract_inverted_index.phrase,	86
abstract_inverted_index.search,	101
abstract_inverted_index.through	68
abstract_inverted_index.ablation	70
abstract_inverted_index.compared	42
abstract_inverted_index.discover	30
abstract_inverted_index.discrete	33, 49, 92
abstract_inverted_index.improves	90
abstract_inverted_index.numerous	69
abstract_inverted_index.original	23
abstract_inverted_index.studies,	71
abstract_inverted_index.supernet	85
abstract_inverted_index.upweight	81
abstract_inverted_index.baseline,	28
abstract_inverted_index.correctly	80
abstract_inverted_index.effective	107
abstract_inverted_index.gradually	89
abstract_inverted_index.improving	110
abstract_inverted_index.on-policy	58
abstract_inverted_index.question:	7
abstract_inverted_index.resulting	91
abstract_inverted_index.applicable	18
abstract_inverted_index.continuous	51
abstract_inverted_index.convenient	27
abstract_inverted_index.off-policy	56
abstract_inverted_index.operations	82
abstract_inverted_index.suggesting	102
abstract_inverted_index.techniques	17
abstract_inverted_index.algorithms,	60
abstract_inverted_index.computation	65
abstract_inverted_index.efficiently	98
abstract_inverted_index.fundamental	6
abstract_inverted_index.investigate	4
abstract_inverted_index.performance	41
abstract_inverted_index.Furthermore,	67
abstract_inverted_index.architecture	14, 45
abstract_inverted_index.environments	54
abstract_inverted_index.surprisingly	105
abstract_inverted_index.architectures	34, 111
abstract_inverted_index.gradient-based	12
abstract_inverted_index.systematically	73
cited_by_percentile_year
countries_distinct_count	0
institutions_distinct_count	7
sustainable_development_goals[0].id	https://metadata.un.org/sdg/9
sustainable_development_goals[0].score	0.4699999988079071
sustainable_development_goals[0].display_name	Industry, innovation and infrastructure
citation_normalized_percentile