Federated Offline Reinforcement Learning: Collaborative Single-Policy Coverage Suffices Article Swipe

PDF

Jiin Woo , Laixi Shi , Gauri Joshi , Yuejie Chi ·

YOU? · · 2024 · Open Access · · DOI: https://doi.org/10.48550/arxiv.2402.05876

Offline reinforcement learning (RL), which seeks to learn an optimal policy using offline data, has garnered significant interest due to its potential in critical applications where online data collection is infeasible or expensive. This work explores the benefit of federated learning for offline RL, aiming at collaboratively leveraging offline datasets at multiple agents. Focusing on finite-horizon episodic tabular Markov decision processes (MDPs), we design FedLCB-Q, a variant of the popular model-free Q-learning algorithm tailored for federated offline RL. FedLCB-Q updates local Q-functions at agents with novel learning rate schedules and aggregates them at a central server using importance averaging and a carefully designed pessimistic penalty term. Our sample complexity analysis reveals that, with appropriately chosen parameters and synchronization schedules, FedLCB-Q achieves linear speedup in terms of the number of agents without requiring high-quality datasets at individual agents, as long as the local datasets collectively cover the state-action space visited by the optimal policy, highlighting the power of collaboration in the federated setting. In fact, the sample complexity almost matches that of the single-agent counterpart, as if all the data are stored at a central location, up to polynomial factors of the horizon length. Furthermore, FedLCB-Q is communication-efficient, where the number of communication rounds is only linear with respect to the horizon length up to logarithmic factors.

Related Topics

Reinforcement Learning

Computer Science

Artificial Intelligence

Social Psychology

Concepts

Reinforcement learning Computer science Reinforcement Artificial intelligence Psychology Social psychology

Metadata

Type: preprint
Language: en
Landing Page: http://arxiv.org/abs/2402.05876
PDF: https://arxiv.org/pdf/2402.05876
OA Status: green
Related Works: 10
OpenAlex ID: https://openalex.org/W4391710056

All OpenAlex metadata

Raw OpenAlex JSON

OpenAlex ID: https://openalex.org/W4391710056

Canonical identifier for this work in OpenAlex
DOI: https://doi.org/10.48550/arxiv.2402.05876

Digital Object Identifier
Title: Federated Offline Reinforcement Learning: Collaborative Single-Policy Coverage Suffices

Work title
Type: preprint

OpenAlex work type
Language: en

Primary language
Publication year: 2024

Year of publication
Publication date: 2024-02-08

Full publication date if available
Authors: Jiin Woo, Laixi Shi, Gauri Joshi, Yuejie Chi

List of authors in order
Landing page: https://arxiv.org/abs/2402.05876

Publisher landing page
PDF URL: https://arxiv.org/pdf/2402.05876

Direct link to full text PDF
Open access: Yes

Whether a free full text is available
OA status: green

Open access status per OpenAlex
OA URL: https://arxiv.org/pdf/2402.05876

Direct OA link when available
Concepts: Reinforcement learning, Computer science, Reinforcement, Artificial intelligence, Psychology, Social psychology

Top concepts (fields/topics) attached by OpenAlex
Cited by: 0

Total citation count in OpenAlex
Related works (count): 10

Other works algorithmically related by OpenAlex

Full payload

id	https://openalex.org/W4391710056
doi	https://doi.org/10.48550/arxiv.2402.05876
ids.doi	https://doi.org/10.48550/arxiv.2402.05876
ids.openalex	https://openalex.org/W4391710056
fwci
type	preprint
title	Federated Offline Reinforcement Learning: Collaborative Single-Policy Coverage Suffices
biblio.issue
biblio.volume
biblio.last_page
biblio.first_page
topics[0].id	https://openalex.org/T11942
topics[0].field.id	https://openalex.org/fields/22
topics[0].field.display_name	Engineering
topics[0].score	0.9397000074386597
topics[0].domain.id	https://openalex.org/domains/3
topics[0].domain.display_name	Physical Sciences
topics[0].subfield.id	https://openalex.org/subfields/2203
topics[0].subfield.display_name	Automotive Engineering
topics[0].display_name	Transportation and Mobility Innovations
is_xpac	False
apc_list
apc_paid
concepts[0].id	https://openalex.org/C97541855
concepts[0].level	2
concepts[0].score	0.7879638671875
concepts[0].wikidata	https://www.wikidata.org/wiki/Q830687
concepts[0].display_name	Reinforcement learning
concepts[1].id	https://openalex.org/C41008148
concepts[1].level	0
concepts[1].score	0.5677229166030884
concepts[1].wikidata	https://www.wikidata.org/wiki/Q21198
concepts[1].display_name	Computer science
concepts[2].id	https://openalex.org/C67203356
concepts[2].level	2
concepts[2].score	0.5671182870864868
concepts[2].wikidata	https://www.wikidata.org/wiki/Q1321905
concepts[2].display_name	Reinforcement
concepts[3].id	https://openalex.org/C154945302
concepts[3].level	1
concepts[3].score	0.33068013191223145
concepts[3].wikidata	https://www.wikidata.org/wiki/Q11660
concepts[3].display_name	Artificial intelligence
concepts[4].id	https://openalex.org/C15744967
concepts[4].level	0
concepts[4].score	0.1921556293964386
concepts[4].wikidata	https://www.wikidata.org/wiki/Q9418
concepts[4].display_name	Psychology
concepts[5].id	https://openalex.org/C77805123
concepts[5].level	1
concepts[5].score	0.11188578605651855
concepts[5].wikidata	https://www.wikidata.org/wiki/Q161272
concepts[5].display_name	Social psychology
keywords[0].id	https://openalex.org/keywords/reinforcement-learning
keywords[0].score	0.7879638671875
keywords[0].display_name	Reinforcement learning
keywords[1].id	https://openalex.org/keywords/computer-science
keywords[1].score	0.5677229166030884
keywords[1].display_name	Computer science
keywords[2].id	https://openalex.org/keywords/reinforcement
keywords[2].score	0.5671182870864868
keywords[2].display_name	Reinforcement
keywords[3].id	https://openalex.org/keywords/artificial-intelligence
keywords[3].score	0.33068013191223145
keywords[3].display_name	Artificial intelligence
keywords[4].id	https://openalex.org/keywords/psychology
keywords[4].score	0.1921556293964386
keywords[4].display_name	Psychology
keywords[5].id	https://openalex.org/keywords/social-psychology
keywords[5].score	0.11188578605651855
keywords[5].display_name	Social psychology
language	en
locations[0].id	pmh:oai:arXiv.org:2402.05876
locations[0].is_oa	True
locations[0].source.id	https://openalex.org/S4306400194
locations[0].source.issn
locations[0].source.type	repository
locations[0].source.is_oa	True
locations[0].source.issn_l
locations[0].source.is_core	False
locations[0].source.is_in_doaj	False
locations[0].source.display_name	arXiv (Cornell University)
locations[0].source.host_organization	https://openalex.org/I205783295
locations[0].source.host_organization_name	Cornell University
locations[0].source.host_organization_lineage	https://openalex.org/I205783295
locations[0].license
locations[0].pdf_url	https://arxiv.org/pdf/2402.05876
locations[0].version	submittedVersion
locations[0].raw_type	text
locations[0].license_id
locations[0].is_accepted	False
locations[0].is_published	False
locations[0].raw_source_name
locations[0].landing_page_url	http://arxiv.org/abs/2402.05876
locations[1].id	doi:10.48550/arxiv.2402.05876
locations[1].is_oa	True
locations[1].source.id	https://openalex.org/S4306400194
locations[1].source.issn
locations[1].source.type	repository
locations[1].source.is_oa	True
locations[1].source.issn_l
locations[1].source.is_core	False
locations[1].source.is_in_doaj	False
locations[1].source.display_name	arXiv (Cornell University)
locations[1].source.host_organization	https://openalex.org/I205783295
locations[1].source.host_organization_name	Cornell University
locations[1].source.host_organization_lineage	https://openalex.org/I205783295
locations[1].license
locations[1].pdf_url
locations[1].version
locations[1].raw_type	article
locations[1].license_id
locations[1].is_accepted	False
locations[1].is_published
locations[1].raw_source_name
locations[1].landing_page_url	https://doi.org/10.48550/arxiv.2402.05876
indexed_in	arxiv, datacite
authorships[0].author.id	https://openalex.org/A5102514656
authorships[0].author.orcid
authorships[0].author.display_name	Jiin Woo
authorships[0].author_position	first
authorships[0].raw_author_name	Woo, Jiin
authorships[0].is_corresponding	False
authorships[1].author.id	https://openalex.org/A5075795654
authorships[1].author.orcid	https://orcid.org/0000-0003-4038-8620
authorships[1].author.display_name	Laixi Shi
authorships[1].author_position	middle
authorships[1].raw_author_name	Shi, Laixi
authorships[1].is_corresponding	False
authorships[2].author.id	https://openalex.org/A5067441201
authorships[2].author.orcid	https://orcid.org/0000-0002-6372-9697
authorships[2].author.display_name	Gauri Joshi
authorships[2].author_position	middle
authorships[2].raw_author_name	Joshi, Gauri
authorships[2].is_corresponding	False
authorships[3].author.id	https://openalex.org/A5053809095
authorships[3].author.orcid	https://orcid.org/0000-0002-6766-5459
authorships[3].author.display_name	Yuejie Chi
authorships[3].author_position	last
authorships[3].raw_author_name	Chi, Yuejie
authorships[3].is_corresponding	False
has_content.pdf	False
has_content.grobid_xml	False
is_paratext	False
open_access.is_oa	True
open_access.oa_url	https://arxiv.org/pdf/2402.05876
open_access.oa_status	green
open_access.any_repository_has_fulltext	False
created_date	2024-02-10T00:00:00
display_name	Federated Offline Reinforcement Learning: Collaborative Single-Policy Coverage Suffices
has_fulltext	False
is_retracted	False
updated_date	2025-11-06T06:51:31.235846
primary_topic.id	https://openalex.org/T11942
primary_topic.field.id	https://openalex.org/fields/22
primary_topic.field.display_name	Engineering
primary_topic.score	0.9397000074386597
primary_topic.domain.id	https://openalex.org/domains/3
primary_topic.domain.display_name	Physical Sciences
primary_topic.subfield.id	https://openalex.org/subfields/2203
primary_topic.subfield.display_name	Automotive Engineering
primary_topic.display_name	Transportation and Mobility Innovations
related_works	https://openalex.org/W2748952813, https://openalex.org/W2920061524, https://openalex.org/W4310083477, https://openalex.org/W1977959518, https://openalex.org/W2038908348, https://openalex.org/W2107890255, https://openalex.org/W2106552856, https://openalex.org/W1987513656, https://openalex.org/W2072376847, https://openalex.org/W2089013912
cited_by_count	0
locations_count	2
best_oa_location.id	pmh:oai:arXiv.org:2402.05876
best_oa_location.is_oa	True
best_oa_location.source.id	https://openalex.org/S4306400194
best_oa_location.source.issn
best_oa_location.source.type	repository
best_oa_location.source.is_oa	True
best_oa_location.source.issn_l
best_oa_location.source.is_core	False
best_oa_location.source.is_in_doaj	False
best_oa_location.source.display_name	arXiv (Cornell University)
best_oa_location.source.host_organization	https://openalex.org/I205783295
best_oa_location.source.host_organization_name	Cornell University
best_oa_location.source.host_organization_lineage	https://openalex.org/I205783295
best_oa_location.license
best_oa_location.pdf_url	https://arxiv.org/pdf/2402.05876
best_oa_location.version	submittedVersion
best_oa_location.raw_type	text
best_oa_location.license_id
best_oa_location.is_accepted	False
best_oa_location.is_published	False
best_oa_location.raw_source_name
best_oa_location.landing_page_url	http://arxiv.org/abs/2402.05876
primary_location.id	pmh:oai:arXiv.org:2402.05876
primary_location.is_oa	True
primary_location.source.id	https://openalex.org/S4306400194
primary_location.source.issn
primary_location.source.type	repository
primary_location.source.is_oa	True
primary_location.source.issn_l
primary_location.source.is_core	False
primary_location.source.is_in_doaj	False
primary_location.source.display_name	arXiv (Cornell University)
primary_location.source.host_organization	https://openalex.org/I205783295
primary_location.source.host_organization_name	Cornell University
primary_location.source.host_organization_lineage	https://openalex.org/I205783295
primary_location.license
primary_location.pdf_url	https://arxiv.org/pdf/2402.05876
primary_location.version	submittedVersion
primary_location.raw_type	text
primary_location.license_id
primary_location.is_accepted	False
primary_location.is_published	False
primary_location.raw_source_name
primary_location.landing_page_url	http://arxiv.org/abs/2402.05876
publication_date	2024-02-08
publication_year	2024
referenced_works_count	0
abstract_inverted_index.a	65, 93, 100, 182
abstract_inverted_index.In	162
abstract_inverted_index.an	8
abstract_inverted_index.as	137, 139, 174
abstract_inverted_index.at	45, 50, 82, 92, 134, 181
abstract_inverted_index.by	149
abstract_inverted_index.if	175
abstract_inverted_index.in	22, 123, 158
abstract_inverted_index.is	29, 195, 203
abstract_inverted_index.of	38, 67, 125, 128, 156, 170, 189, 200
abstract_inverted_index.on	54
abstract_inverted_index.or	31
abstract_inverted_index.to	6, 19, 186, 208, 213
abstract_inverted_index.up	185, 212
abstract_inverted_index.we	62
abstract_inverted_index.Our	106
abstract_inverted_index.RL,	43
abstract_inverted_index.RL.	77
abstract_inverted_index.all	176
abstract_inverted_index.and	89, 99, 116
abstract_inverted_index.are	179
abstract_inverted_index.due	18
abstract_inverted_index.for	41, 74
abstract_inverted_index.has	14
abstract_inverted_index.its	20
abstract_inverted_index.the	36, 68, 126, 140, 145, 150, 154, 159, 164, 171, 177, 190, 198, 209
abstract_inverted_index.This	33
abstract_inverted_index.data	27, 178
abstract_inverted_index.long	138
abstract_inverted_index.only	204
abstract_inverted_index.rate	87
abstract_inverted_index.that	169
abstract_inverted_index.them	91
abstract_inverted_index.with	84, 112, 206
abstract_inverted_index.work	34
abstract_inverted_index.(RL),	3
abstract_inverted_index.cover	144
abstract_inverted_index.data,	13
abstract_inverted_index.fact,	163
abstract_inverted_index.learn	7
abstract_inverted_index.local	80, 141
abstract_inverted_index.novel	85
abstract_inverted_index.power	155
abstract_inverted_index.seeks	5
abstract_inverted_index.space	147
abstract_inverted_index.term.	105
abstract_inverted_index.terms	124
abstract_inverted_index.that,	111
abstract_inverted_index.using	11, 96
abstract_inverted_index.where	25, 197
abstract_inverted_index.which	4
abstract_inverted_index.Markov	58
abstract_inverted_index.agents	83, 129
abstract_inverted_index.aiming	44
abstract_inverted_index.almost	167
abstract_inverted_index.chosen	114
abstract_inverted_index.design	63
abstract_inverted_index.length	211
abstract_inverted_index.linear	121, 205
abstract_inverted_index.number	127, 199
abstract_inverted_index.online	26
abstract_inverted_index.policy	10
abstract_inverted_index.rounds	202
abstract_inverted_index.sample	107, 165
abstract_inverted_index.server	95
abstract_inverted_index.stored	180
abstract_inverted_index.(MDPs),	61
abstract_inverted_index.Offline	0
abstract_inverted_index.agents,	136
abstract_inverted_index.agents.	52
abstract_inverted_index.benefit	37
abstract_inverted_index.central	94, 183
abstract_inverted_index.factors	188
abstract_inverted_index.horizon	191, 210
abstract_inverted_index.length.	192
abstract_inverted_index.matches	168
abstract_inverted_index.offline	12, 42, 48, 76
abstract_inverted_index.optimal	9, 151
abstract_inverted_index.penalty	104
abstract_inverted_index.policy,	152
abstract_inverted_index.popular	69
abstract_inverted_index.respect	207
abstract_inverted_index.reveals	110
abstract_inverted_index.speedup	122
abstract_inverted_index.tabular	57
abstract_inverted_index.updates	79
abstract_inverted_index.variant	66
abstract_inverted_index.visited	148
abstract_inverted_index.without	130
abstract_inverted_index.FedLCB-Q	78, 119, 194
abstract_inverted_index.Focusing	53
abstract_inverted_index.achieves	120
abstract_inverted_index.analysis	109
abstract_inverted_index.critical	23
abstract_inverted_index.datasets	49, 133, 142
abstract_inverted_index.decision	59
abstract_inverted_index.designed	102
abstract_inverted_index.episodic	56
abstract_inverted_index.explores	35
abstract_inverted_index.factors.	215
abstract_inverted_index.garnered	15
abstract_inverted_index.interest	17
abstract_inverted_index.learning	2, 40, 86
abstract_inverted_index.multiple	51
abstract_inverted_index.setting.	161
abstract_inverted_index.tailored	73
abstract_inverted_index.FedLCB-Q,	64
abstract_inverted_index.algorithm	72
abstract_inverted_index.averaging	98
abstract_inverted_index.carefully	101
abstract_inverted_index.federated	39, 75, 160
abstract_inverted_index.location,	184
abstract_inverted_index.potential	21
abstract_inverted_index.processes	60
abstract_inverted_index.requiring	131
abstract_inverted_index.schedules	88
abstract_inverted_index.Q-learning	71
abstract_inverted_index.aggregates	90
abstract_inverted_index.collection	28
abstract_inverted_index.complexity	108, 166
abstract_inverted_index.expensive.	32
abstract_inverted_index.importance	97
abstract_inverted_index.individual	135
abstract_inverted_index.infeasible	30
abstract_inverted_index.leveraging	47
abstract_inverted_index.model-free	70
abstract_inverted_index.parameters	115
abstract_inverted_index.polynomial	187
abstract_inverted_index.schedules,	118
abstract_inverted_index.Q-functions	81
abstract_inverted_index.logarithmic	214
abstract_inverted_index.pessimistic	103
abstract_inverted_index.significant	16
abstract_inverted_index.Furthermore,	193
abstract_inverted_index.applications	24
abstract_inverted_index.collectively	143
abstract_inverted_index.counterpart,	173
abstract_inverted_index.high-quality	132
abstract_inverted_index.highlighting	153
abstract_inverted_index.single-agent	172
abstract_inverted_index.state-action	146
abstract_inverted_index.appropriately	113
abstract_inverted_index.collaboration	157
abstract_inverted_index.communication	201
abstract_inverted_index.reinforcement	1
abstract_inverted_index.finite-horizon	55
abstract_inverted_index.collaboratively	46
abstract_inverted_index.synchronization	117
abstract_inverted_index.communication-efficient,	196
cited_by_percentile_year
countries_distinct_count	0
institutions_distinct_count	4
citation_normalized_percentile