Multiplicative Controller Fusion: Leveraging Algorithmic Priors for\n Sample-efficient Reinforcement Learning and Safe Sim-To-Real Transfer Article Swipe

PDF

Krishan Rana , Vibhavari Dasagi , Ben Talbot , Michael Milford , Niko Sünderhauf ·

YOU? · · 2020 · Open Access · · DOI: https://doi.org/10.48550/arxiv.2003.05117

Learning-based approaches often outperform hand-coded algorithmic solutions\nfor many problems in robotics. However, learning long-horizon tasks on real\nrobot hardware can be intractable, and transferring a learned policy from\nsimulation to reality is still extremely challenging. We present a novel\napproach to model-free reinforcement learning that can leverage existing\nsub-optimal solutions as an algorithmic prior during training and deployment.\nDuring training, our gated fusion approach enables the prior to guide the\ninitial stages of exploration, increasing sample-efficiency and enabling\nlearning from sparse long-horizon reward signals. Importantly, the policy can\nlearn to improve beyond the performance of the sub-optimal prior since the\nprior's influence is annealed gradually. During deployment, the policy's\nuncertainty provides a reliable strategy for transferring a simulation-trained\npolicy to the real world by falling back to the prior controller in uncertain\nstates. We show the efficacy of our Multiplicative Controller Fusion approach\non the task of robot navigation and demonstrate safe transfer from simulation\nto the real world without any fine-tuning. The code for this project is made\npublicly available at https://sites.google.com/view/mcf-nav/home\n

Related Topics

Reinforcement Learning

Computer Science

Artificial Intelligence

Mathematical Analysis

Biology

Concepts

Reinforcement learning Leverage (statistics) Computer science Software deployment Artificial intelligence Robotics Robot Controller (irrigation) Transfer of learning Sample (material) Prior probability Machine learning Task (project management) Multiplicative function Bayesian probability Engineering Operating system Chromatography Agronomy Mathematics Chemistry Systems engineering Mathematical analysis Biology

Metadata

Type: preprint
Language: en
Landing Page: http://arxiv.org/abs/2003.05117
PDF: https://arxiv.org/pdf/2003.05117
OA Status: green
Related Works: 10
OpenAlex ID: https://openalex.org/W4287827453

All OpenAlex metadata

Raw OpenAlex JSON

OpenAlex ID: https://openalex.org/W4287827453

Canonical identifier for this work in OpenAlex
DOI: https://doi.org/10.48550/arxiv.2003.05117

Digital Object Identifier
Title: Multiplicative Controller Fusion: Leveraging Algorithmic Priors for\n Sample-efficient Reinforcement Learning and Safe Sim-To-Real Transfer

Work title
Type: preprint

OpenAlex work type
Language: en

Primary language
Publication year: 2020

Year of publication
Publication date: 2020-03-11

Full publication date if available
Authors: Krishan Rana, Vibhavari Dasagi, Ben Talbot, Michael Milford, Niko Sünderhauf

List of authors in order
Landing page: https://arxiv.org/abs/2003.05117

Publisher landing page
PDF URL: https://arxiv.org/pdf/2003.05117

Direct link to full text PDF
Open access: Yes

Whether a free full text is available
OA status: green

Open access status per OpenAlex
OA URL: https://arxiv.org/pdf/2003.05117

Direct OA link when available
Concepts: Reinforcement learning, Leverage (statistics), Computer science, Software deployment, Artificial intelligence, Robotics, Robot, Controller (irrigation), Transfer of learning, Sample (material), Prior probability, Machine learning, Task (project management), Multiplicative function, Bayesian probability, Engineering, Operating system, Chromatography, Agronomy, Mathematics, Chemistry, Systems engineering, Mathematical analysis, Biology

Top concepts (fields/topics) attached by OpenAlex
Cited by: 0

Total citation count in OpenAlex
Related works (count): 10

Other works algorithmically related by OpenAlex

Full payload

id	https://openalex.org/W4287827453
doi	https://doi.org/10.48550/arxiv.2003.05117
ids.openalex	https://openalex.org/W4287827453
fwci	0.0
type	preprint
title	Multiplicative Controller Fusion: Leveraging Algorithmic Priors for\n Sample-efficient Reinforcement Learning and Safe Sim-To-Real Transfer
biblio.issue
biblio.volume
biblio.last_page
biblio.first_page
topics[0].id	https://openalex.org/T10462
topics[0].field.id	https://openalex.org/fields/17
topics[0].field.display_name	Computer Science
topics[0].score	0.9925000071525574
topics[0].domain.id	https://openalex.org/domains/3
topics[0].domain.display_name	Physical Sciences
topics[0].subfield.id	https://openalex.org/subfields/1702
topics[0].subfield.display_name	Artificial Intelligence
topics[0].display_name	Reinforcement Learning in Robotics
topics[1].id	https://openalex.org/T11689
topics[1].field.id	https://openalex.org/fields/17
topics[1].field.display_name	Computer Science
topics[1].score	0.9785000085830688
topics[1].domain.id	https://openalex.org/domains/3
topics[1].domain.display_name	Physical Sciences
topics[1].subfield.id	https://openalex.org/subfields/1702
topics[1].subfield.display_name	Artificial Intelligence
topics[1].display_name	Adversarial Robustness in Machine Learning
topics[2].id	https://openalex.org/T12814
topics[2].field.id	https://openalex.org/fields/17
topics[2].field.display_name	Computer Science
topics[2].score	0.9628000259399414
topics[2].domain.id	https://openalex.org/domains/3
topics[2].domain.display_name	Physical Sciences
topics[2].subfield.id	https://openalex.org/subfields/1702
topics[2].subfield.display_name	Artificial Intelligence
topics[2].display_name	Gaussian Processes and Bayesian Inference
is_xpac	False
apc_list
apc_paid
concepts[0].id	https://openalex.org/C97541855
concepts[0].level	2
concepts[0].score	0.8168928623199463
concepts[0].wikidata	https://www.wikidata.org/wiki/Q830687
concepts[0].display_name	Reinforcement learning
concepts[1].id	https://openalex.org/C153083717
concepts[1].level	2
concepts[1].score	0.7460945248603821
concepts[1].wikidata	https://www.wikidata.org/wiki/Q6535263
concepts[1].display_name	Leverage (statistics)
concepts[2].id	https://openalex.org/C41008148
concepts[2].level	0
concepts[2].score	0.7272384762763977
concepts[2].wikidata	https://www.wikidata.org/wiki/Q21198
concepts[2].display_name	Computer science
concepts[3].id	https://openalex.org/C105339364
concepts[3].level	2
concepts[3].score	0.6316240429878235
concepts[3].wikidata	https://www.wikidata.org/wiki/Q2297740
concepts[3].display_name	Software deployment
concepts[4].id	https://openalex.org/C154945302
concepts[4].level	1
concepts[4].score	0.6192331910133362
concepts[4].wikidata	https://www.wikidata.org/wiki/Q11660
concepts[4].display_name	Artificial intelligence
concepts[5].id	https://openalex.org/C34413123
concepts[5].level	3
concepts[5].score	0.5419933795928955
concepts[5].wikidata	https://www.wikidata.org/wiki/Q170978
concepts[5].display_name	Robotics
concepts[6].id	https://openalex.org/C90509273
concepts[6].level	2
concepts[6].score	0.5385355353355408
concepts[6].wikidata	https://www.wikidata.org/wiki/Q11012
concepts[6].display_name	Robot
concepts[7].id	https://openalex.org/C203479927
concepts[7].level	2
concepts[7].score	0.4826543927192688
concepts[7].wikidata	https://www.wikidata.org/wiki/Q5165939
concepts[7].display_name	Controller (irrigation)
concepts[8].id	https://openalex.org/C150899416
concepts[8].level	2
concepts[8].score	0.47021105885505676
concepts[8].wikidata	https://www.wikidata.org/wiki/Q1820378
concepts[8].display_name	Transfer of learning
concepts[9].id	https://openalex.org/C198531522
concepts[9].level	2
concepts[9].score	0.46649500727653503
concepts[9].wikidata	https://www.wikidata.org/wiki/Q485146
concepts[9].display_name	Sample (material)
concepts[10].id	https://openalex.org/C177769412
concepts[10].level	3
concepts[10].score	0.464738130569458
concepts[10].wikidata	https://www.wikidata.org/wiki/Q278090
concepts[10].display_name	Prior probability
concepts[11].id	https://openalex.org/C119857082
concepts[11].level	1
concepts[11].score	0.4620271325111389
concepts[11].wikidata	https://www.wikidata.org/wiki/Q2539
concepts[11].display_name	Machine learning
concepts[12].id	https://openalex.org/C2780451532
concepts[12].level	2
concepts[12].score	0.4461824595928192
concepts[12].wikidata	https://www.wikidata.org/wiki/Q759676
concepts[12].display_name	Task (project management)
concepts[13].id	https://openalex.org/C42747912
concepts[13].level	2
concepts[13].score	0.4160654544830322
concepts[13].wikidata	https://www.wikidata.org/wiki/Q1048447
concepts[13].display_name	Multiplicative function
concepts[14].id	https://openalex.org/C107673813
concepts[14].level	2
concepts[14].score	0.3114786148071289
concepts[14].wikidata	https://www.wikidata.org/wiki/Q812534
concepts[14].display_name	Bayesian probability
concepts[15].id	https://openalex.org/C127413603
concepts[15].level	0
concepts[15].score	0.11840629577636719
concepts[15].wikidata	https://www.wikidata.org/wiki/Q11023
concepts[15].display_name	Engineering
concepts[16].id	https://openalex.org/C111919701
concepts[16].level	1
concepts[16].score	0.0
concepts[16].wikidata	https://www.wikidata.org/wiki/Q9135
concepts[16].display_name	Operating system
concepts[17].id	https://openalex.org/C43617362
concepts[17].level	1
concepts[17].score	0.0
concepts[17].wikidata	https://www.wikidata.org/wiki/Q170050
concepts[17].display_name	Chromatography
concepts[18].id	https://openalex.org/C6557445
concepts[18].level	1
concepts[18].score	0.0
concepts[18].wikidata	https://www.wikidata.org/wiki/Q173113
concepts[18].display_name	Agronomy
concepts[19].id	https://openalex.org/C33923547
concepts[19].level	0
concepts[19].score	0.0
concepts[19].wikidata	https://www.wikidata.org/wiki/Q395
concepts[19].display_name	Mathematics
concepts[20].id	https://openalex.org/C185592680
concepts[20].level	0
concepts[20].score	0.0
concepts[20].wikidata	https://www.wikidata.org/wiki/Q2329
concepts[20].display_name	Chemistry
concepts[21].id	https://openalex.org/C201995342
concepts[21].level	1
concepts[21].score	0.0
concepts[21].wikidata	https://www.wikidata.org/wiki/Q682496
concepts[21].display_name	Systems engineering
concepts[22].id	https://openalex.org/C134306372
concepts[22].level	1
concepts[22].score	0.0
concepts[22].wikidata	https://www.wikidata.org/wiki/Q7754
concepts[22].display_name	Mathematical analysis
concepts[23].id	https://openalex.org/C86803240
concepts[23].level	0
concepts[23].score	0.0
concepts[23].wikidata	https://www.wikidata.org/wiki/Q420
concepts[23].display_name	Biology
keywords[0].id	https://openalex.org/keywords/reinforcement-learning
keywords[0].score	0.8168928623199463
keywords[0].display_name	Reinforcement learning
keywords[1].id	https://openalex.org/keywords/leverage
keywords[1].score	0.7460945248603821
keywords[1].display_name	Leverage (statistics)
keywords[2].id	https://openalex.org/keywords/computer-science
keywords[2].score	0.7272384762763977
keywords[2].display_name	Computer science
keywords[3].id	https://openalex.org/keywords/software-deployment
keywords[3].score	0.6316240429878235
keywords[3].display_name	Software deployment
keywords[4].id	https://openalex.org/keywords/artificial-intelligence
keywords[4].score	0.6192331910133362
keywords[4].display_name	Artificial intelligence
keywords[5].id	https://openalex.org/keywords/robotics
keywords[5].score	0.5419933795928955
keywords[5].display_name	Robotics
keywords[6].id	https://openalex.org/keywords/robot
keywords[6].score	0.5385355353355408
keywords[6].display_name	Robot
keywords[7].id	https://openalex.org/keywords/controller
keywords[7].score	0.4826543927192688
keywords[7].display_name	Controller (irrigation)
keywords[8].id	https://openalex.org/keywords/transfer-of-learning
keywords[8].score	0.47021105885505676
keywords[8].display_name	Transfer of learning
keywords[9].id	https://openalex.org/keywords/sample
keywords[9].score	0.46649500727653503
keywords[9].display_name	Sample (material)
keywords[10].id	https://openalex.org/keywords/prior-probability
keywords[10].score	0.464738130569458
keywords[10].display_name	Prior probability
keywords[11].id	https://openalex.org/keywords/machine-learning
keywords[11].score	0.4620271325111389
keywords[11].display_name	Machine learning
keywords[12].id	https://openalex.org/keywords/task
keywords[12].score	0.4461824595928192
keywords[12].display_name	Task (project management)
keywords[13].id	https://openalex.org/keywords/multiplicative-function
keywords[13].score	0.4160654544830322
keywords[13].display_name	Multiplicative function
keywords[14].id	https://openalex.org/keywords/bayesian-probability
keywords[14].score	0.3114786148071289
keywords[14].display_name	Bayesian probability
keywords[15].id	https://openalex.org/keywords/engineering
keywords[15].score	0.11840629577636719
keywords[15].display_name	Engineering
language	en
locations[0].id	pmh:oai:arXiv.org:2003.05117
locations[0].is_oa	True
locations[0].source.id	https://openalex.org/S4306400194
locations[0].source.issn
locations[0].source.type	repository
locations[0].source.is_oa	True
locations[0].source.issn_l
locations[0].source.is_core	False
locations[0].source.is_in_doaj	False
locations[0].source.display_name	arXiv (Cornell University)
locations[0].source.host_organization	https://openalex.org/I205783295
locations[0].source.host_organization_name	Cornell University
locations[0].source.host_organization_lineage	https://openalex.org/I205783295
locations[0].license
locations[0].pdf_url	https://arxiv.org/pdf/2003.05117
locations[0].version	submittedVersion
locations[0].raw_type	text
locations[0].license_id
locations[0].is_accepted	False
locations[0].is_published	False
locations[0].raw_source_name
locations[0].landing_page_url	http://arxiv.org/abs/2003.05117
indexed_in	arxiv
authorships[0].author.id	https://openalex.org/A5001506562
authorships[0].author.orcid	https://orcid.org/0000-0002-9028-9295
authorships[0].author.display_name	Krishan Rana
authorships[0].author_position	first
authorships[0].raw_author_name	Rana, Krishan
authorships[0].is_corresponding	False
authorships[1].author.id	https://openalex.org/A5012909623
authorships[1].author.orcid
authorships[1].author.display_name	Vibhavari Dasagi
authorships[1].author_position	middle
authorships[1].raw_author_name	Dasagi, Vibhavari
authorships[1].is_corresponding	False
authorships[2].author.id	https://openalex.org/A5055734658
authorships[2].author.orcid	https://orcid.org/0000-0001-7029-7813
authorships[2].author.display_name	Ben Talbot
authorships[2].author_position	middle
authorships[2].raw_author_name	Talbot, Ben
authorships[2].is_corresponding	False
authorships[3].author.id	https://openalex.org/A5078340555
authorships[3].author.orcid	https://orcid.org/0000-0002-5162-1793
authorships[3].author.display_name	Michael Milford
authorships[3].author_position	middle
authorships[3].raw_author_name	Milford, Michael
authorships[3].is_corresponding	False
authorships[4].author.id	https://openalex.org/A5034957065
authorships[4].author.orcid	https://orcid.org/0000-0001-5286-3789
authorships[4].author.display_name	Niko Sünderhauf
authorships[4].author_position	last
authorships[4].raw_author_name	Sünderhauf, Niko
authorships[4].is_corresponding	False
has_content.pdf	False
has_content.grobid_xml	False
is_paratext	False
open_access.is_oa	True
open_access.oa_url	https://arxiv.org/pdf/2003.05117
open_access.oa_status	green
open_access.any_repository_has_fulltext	False
created_date	2022-07-26T00:00:00
display_name	Multiplicative Controller Fusion: Leveraging Algorithmic Priors for\n Sample-efficient Reinforcement Learning and Safe Sim-To-Real Transfer
has_fulltext	False
is_retracted	False
updated_date	2025-11-06T03:46:38.306776
primary_topic.id	https://openalex.org/T10462
primary_topic.field.id	https://openalex.org/fields/17
primary_topic.field.display_name	Computer Science
primary_topic.score	0.9925000071525574
primary_topic.domain.id	https://openalex.org/domains/3
primary_topic.domain.display_name	Physical Sciences
primary_topic.subfield.id	https://openalex.org/subfields/1702
primary_topic.subfield.display_name	Artificial Intelligence
primary_topic.display_name	Reinforcement Learning in Robotics
related_works	https://openalex.org/W2580650124, https://openalex.org/W4386190339, https://openalex.org/W2968424575, https://openalex.org/W3142333283, https://openalex.org/W3122088529, https://openalex.org/W3041320102, https://openalex.org/W2111669074, https://openalex.org/W2085259108, https://openalex.org/W3123087812, https://openalex.org/W2063076820
cited_by_count	0
locations_count	1
best_oa_location.id	pmh:oai:arXiv.org:2003.05117
best_oa_location.is_oa	True
best_oa_location.source.id	https://openalex.org/S4306400194
best_oa_location.source.issn
best_oa_location.source.type	repository
best_oa_location.source.is_oa	True
best_oa_location.source.issn_l
best_oa_location.source.is_core	False
best_oa_location.source.is_in_doaj	False
best_oa_location.source.display_name	arXiv (Cornell University)
best_oa_location.source.host_organization	https://openalex.org/I205783295
best_oa_location.source.host_organization_name	Cornell University
best_oa_location.source.host_organization_lineage	https://openalex.org/I205783295
best_oa_location.license
best_oa_location.pdf_url	https://arxiv.org/pdf/2003.05117
best_oa_location.version	submittedVersion
best_oa_location.raw_type	text
best_oa_location.license_id
best_oa_location.is_accepted	False
best_oa_location.is_published	False
best_oa_location.raw_source_name
best_oa_location.landing_page_url	http://arxiv.org/abs/2003.05117
primary_location.id	pmh:oai:arXiv.org:2003.05117
primary_location.is_oa	True
primary_location.source.id	https://openalex.org/S4306400194
primary_location.source.issn
primary_location.source.type	repository
primary_location.source.is_oa	True
primary_location.source.issn_l
primary_location.source.is_core	False
primary_location.source.is_in_doaj	False
primary_location.source.display_name	arXiv (Cornell University)
primary_location.source.host_organization	https://openalex.org/I205783295
primary_location.source.host_organization_name	Cornell University
primary_location.source.host_organization_lineage	https://openalex.org/I205783295
primary_location.license
primary_location.pdf_url	https://arxiv.org/pdf/2003.05117
primary_location.version	submittedVersion
primary_location.raw_type	text
primary_location.license_id
primary_location.is_accepted	False
primary_location.is_published	False
primary_location.raw_source_name
primary_location.landing_page_url	http://arxiv.org/abs/2003.05117
publication_date	2020-03-11
publication_year	2020
referenced_works_count	0
abstract_inverted_index.a	23, 35, 101, 106
abstract_inverted_index.We	33, 121
abstract_inverted_index.an	47
abstract_inverted_index.as	46
abstract_inverted_index.at	156
abstract_inverted_index.be	19
abstract_inverted_index.by	112
abstract_inverted_index.in	9, 119
abstract_inverted_index.is	29, 93, 153
abstract_inverted_index.of	66, 86, 125, 133
abstract_inverted_index.on	15
abstract_inverted_index.to	27, 37, 62, 81, 108, 115
abstract_inverted_index.The	148
abstract_inverted_index.and	21, 52, 70, 136
abstract_inverted_index.any	146
abstract_inverted_index.can	18, 42
abstract_inverted_index.for	104, 150
abstract_inverted_index.our	55, 126
abstract_inverted_index.the	60, 78, 84, 87, 98, 109, 116, 123, 131, 142
abstract_inverted_index.back	114
abstract_inverted_index.code	149
abstract_inverted_index.from	72, 140
abstract_inverted_index.many	7
abstract_inverted_index.real	110, 143
abstract_inverted_index.safe	138
abstract_inverted_index.show	122
abstract_inverted_index.task	132
abstract_inverted_index.that	41
abstract_inverted_index.this	151
abstract_inverted_index.gated	56
abstract_inverted_index.guide	63
abstract_inverted_index.often	2
abstract_inverted_index.prior	49, 61, 89, 117
abstract_inverted_index.robot	134
abstract_inverted_index.since	90
abstract_inverted_index.still	30
abstract_inverted_index.tasks	14
abstract_inverted_index.world	111, 144
abstract_inverted_index.During	96
abstract_inverted_index.Fusion	129
abstract_inverted_index.beyond	83
abstract_inverted_index.during	50
abstract_inverted_index.fusion	57
abstract_inverted_index.policy	25, 79
abstract_inverted_index.reward	75
abstract_inverted_index.sparse	73
abstract_inverted_index.stages	65
abstract_inverted_index.enables	59
abstract_inverted_index.falling	113
abstract_inverted_index.improve	82
abstract_inverted_index.learned	24
abstract_inverted_index.present	34
abstract_inverted_index.project	152
abstract_inverted_index.reality	28
abstract_inverted_index.without	145
abstract_inverted_index.However,	11
abstract_inverted_index.annealed	94
abstract_inverted_index.approach	58
abstract_inverted_index.efficacy	124
abstract_inverted_index.hardware	17
abstract_inverted_index.learning	12, 40
abstract_inverted_index.leverage	43
abstract_inverted_index.problems	8
abstract_inverted_index.provides	100
abstract_inverted_index.reliable	102
abstract_inverted_index.signals.	76
abstract_inverted_index.strategy	103
abstract_inverted_index.training	51
abstract_inverted_index.transfer	139
abstract_inverted_index.available	155
abstract_inverted_index.extremely	31
abstract_inverted_index.influence	92
abstract_inverted_index.robotics.	10
abstract_inverted_index.solutions	45
abstract_inverted_index.training,	54
abstract_inverted_index.Controller	128
abstract_inverted_index.approaches	1
abstract_inverted_index.can\nlearn	80
abstract_inverted_index.controller	118
abstract_inverted_index.gradually.	95
abstract_inverted_index.hand-coded	4
abstract_inverted_index.increasing	68
abstract_inverted_index.model-free	38
abstract_inverted_index.navigation	135
abstract_inverted_index.outperform	3
abstract_inverted_index.algorithmic	5, 48
abstract_inverted_index.demonstrate	137
abstract_inverted_index.deployment,	97
abstract_inverted_index.performance	85
abstract_inverted_index.real\nrobot	16
abstract_inverted_index.sub-optimal	88
abstract_inverted_index.Importantly,	77
abstract_inverted_index.approach\non	130
abstract_inverted_index.challenging.	32
abstract_inverted_index.exploration,	67
abstract_inverted_index.fine-tuning.	147
abstract_inverted_index.intractable,	20
abstract_inverted_index.long-horizon	13, 74
abstract_inverted_index.the\ninitial	64
abstract_inverted_index.the\nprior's	91
abstract_inverted_index.transferring	22, 105
abstract_inverted_index.reinforcement	39
abstract_inverted_index.Learning-based	0
abstract_inverted_index.Multiplicative	127
abstract_inverted_index.made\npublicly	154
abstract_inverted_index.simulation\nto	141
abstract_inverted_index.solutions\nfor	6
abstract_inverted_index.novel\napproach	36
abstract_inverted_index.from\nsimulation	26
abstract_inverted_index.sample-efficiency	69
abstract_inverted_index.enabling\nlearning	71
abstract_inverted_index.uncertain\nstates.	120
abstract_inverted_index.deployment.\nDuring	53
abstract_inverted_index.existing\nsub-optimal	44
abstract_inverted_index.policy's\nuncertainty	99
abstract_inverted_index.simulation-trained\npolicy	107
abstract_inverted_index.https://sites.google.com/view/mcf-nav/home\n	157
cited_by_percentile_year
countries_distinct_count	0
institutions_distinct_count	5
sustainable_development_goals[0].id	https://metadata.un.org/sdg/17
sustainable_development_goals[0].score	0.4300000071525574
sustainable_development_goals[0].display_name	Partnerships for the goals
citation_normalized_percentile.value	0.29899756
citation_normalized_percentile.is_in_top_1_percent	False
citation_normalized_percentile.is_in_top_10_percent	False