Federated Offline Reinforcement Learning: Collaborative Single-Policy Coverage Suffices Article Swipe
YOU?
·
· 2024
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2402.05876
Offline reinforcement learning (RL), which seeks to learn an optimal policy using offline data, has garnered significant interest due to its potential in critical applications where online data collection is infeasible or expensive. This work explores the benefit of federated learning for offline RL, aiming at collaboratively leveraging offline datasets at multiple agents. Focusing on finite-horizon episodic tabular Markov decision processes (MDPs), we design FedLCB-Q, a variant of the popular model-free Q-learning algorithm tailored for federated offline RL. FedLCB-Q updates local Q-functions at agents with novel learning rate schedules and aggregates them at a central server using importance averaging and a carefully designed pessimistic penalty term. Our sample complexity analysis reveals that, with appropriately chosen parameters and synchronization schedules, FedLCB-Q achieves linear speedup in terms of the number of agents without requiring high-quality datasets at individual agents, as long as the local datasets collectively cover the state-action space visited by the optimal policy, highlighting the power of collaboration in the federated setting. In fact, the sample complexity almost matches that of the single-agent counterpart, as if all the data are stored at a central location, up to polynomial factors of the horizon length. Furthermore, FedLCB-Q is communication-efficient, where the number of communication rounds is only linear with respect to the horizon length up to logarithmic factors.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2402.05876
- https://arxiv.org/pdf/2402.05876
- OA Status
- green
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4391710056
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4391710056Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2402.05876Digital Object Identifier
- Title
-
Federated Offline Reinforcement Learning: Collaborative Single-Policy Coverage SufficesWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2024Year of publication
- Publication date
-
2024-02-08Full publication date if available
- Authors
-
Jiin Woo, Laixi Shi, Gauri Joshi, Yuejie ChiList of authors in order
- Landing page
-
https://arxiv.org/abs/2402.05876Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2402.05876Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2402.05876Direct OA link when available
- Concepts
-
Reinforcement learning, Computer science, Reinforcement, Artificial intelligence, Psychology, Social psychologyTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4391710056 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2402.05876 |
| ids.doi | https://doi.org/10.48550/arxiv.2402.05876 |
| ids.openalex | https://openalex.org/W4391710056 |
| fwci | |
| type | preprint |
| title | Federated Offline Reinforcement Learning: Collaborative Single-Policy Coverage Suffices |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T11942 |
| topics[0].field.id | https://openalex.org/fields/22 |
| topics[0].field.display_name | Engineering |
| topics[0].score | 0.9397000074386597 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/2203 |
| topics[0].subfield.display_name | Automotive Engineering |
| topics[0].display_name | Transportation and Mobility Innovations |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C97541855 |
| concepts[0].level | 2 |
| concepts[0].score | 0.7879638671875 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q830687 |
| concepts[0].display_name | Reinforcement learning |
| concepts[1].id | https://openalex.org/C41008148 |
| concepts[1].level | 0 |
| concepts[1].score | 0.5677229166030884 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[1].display_name | Computer science |
| concepts[2].id | https://openalex.org/C67203356 |
| concepts[2].level | 2 |
| concepts[2].score | 0.5671182870864868 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q1321905 |
| concepts[2].display_name | Reinforcement |
| concepts[3].id | https://openalex.org/C154945302 |
| concepts[3].level | 1 |
| concepts[3].score | 0.33068013191223145 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[3].display_name | Artificial intelligence |
| concepts[4].id | https://openalex.org/C15744967 |
| concepts[4].level | 0 |
| concepts[4].score | 0.1921556293964386 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q9418 |
| concepts[4].display_name | Psychology |
| concepts[5].id | https://openalex.org/C77805123 |
| concepts[5].level | 1 |
| concepts[5].score | 0.11188578605651855 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q161272 |
| concepts[5].display_name | Social psychology |
| keywords[0].id | https://openalex.org/keywords/reinforcement-learning |
| keywords[0].score | 0.7879638671875 |
| keywords[0].display_name | Reinforcement learning |
| keywords[1].id | https://openalex.org/keywords/computer-science |
| keywords[1].score | 0.5677229166030884 |
| keywords[1].display_name | Computer science |
| keywords[2].id | https://openalex.org/keywords/reinforcement |
| keywords[2].score | 0.5671182870864868 |
| keywords[2].display_name | Reinforcement |
| keywords[3].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[3].score | 0.33068013191223145 |
| keywords[3].display_name | Artificial intelligence |
| keywords[4].id | https://openalex.org/keywords/psychology |
| keywords[4].score | 0.1921556293964386 |
| keywords[4].display_name | Psychology |
| keywords[5].id | https://openalex.org/keywords/social-psychology |
| keywords[5].score | 0.11188578605651855 |
| keywords[5].display_name | Social psychology |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2402.05876 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2402.05876 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2402.05876 |
| locations[1].id | doi:10.48550/arxiv.2402.05876 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2402.05876 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5102514656 |
| authorships[0].author.orcid | |
| authorships[0].author.display_name | Jiin Woo |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Woo, Jiin |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5075795654 |
| authorships[1].author.orcid | https://orcid.org/0000-0003-4038-8620 |
| authorships[1].author.display_name | Laixi Shi |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Shi, Laixi |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5067441201 |
| authorships[2].author.orcid | https://orcid.org/0000-0002-6372-9697 |
| authorships[2].author.display_name | Gauri Joshi |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Joshi, Gauri |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5053809095 |
| authorships[3].author.orcid | https://orcid.org/0000-0002-6766-5459 |
| authorships[3].author.display_name | Yuejie Chi |
| authorships[3].author_position | last |
| authorships[3].raw_author_name | Chi, Yuejie |
| authorships[3].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2402.05876 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2024-02-10T00:00:00 |
| display_name | Federated Offline Reinforcement Learning: Collaborative Single-Policy Coverage Suffices |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T11942 |
| primary_topic.field.id | https://openalex.org/fields/22 |
| primary_topic.field.display_name | Engineering |
| primary_topic.score | 0.9397000074386597 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/2203 |
| primary_topic.subfield.display_name | Automotive Engineering |
| primary_topic.display_name | Transportation and Mobility Innovations |
| related_works | https://openalex.org/W2748952813, https://openalex.org/W2920061524, https://openalex.org/W4310083477, https://openalex.org/W1977959518, https://openalex.org/W2038908348, https://openalex.org/W2107890255, https://openalex.org/W2106552856, https://openalex.org/W1987513656, https://openalex.org/W2072376847, https://openalex.org/W2089013912 |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2402.05876 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2402.05876 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2402.05876 |
| primary_location.id | pmh:oai:arXiv.org:2402.05876 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2402.05876 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2402.05876 |
| publication_date | 2024-02-08 |
| publication_year | 2024 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 65, 93, 100, 182 |
| abstract_inverted_index.In | 162 |
| abstract_inverted_index.an | 8 |
| abstract_inverted_index.as | 137, 139, 174 |
| abstract_inverted_index.at | 45, 50, 82, 92, 134, 181 |
| abstract_inverted_index.by | 149 |
| abstract_inverted_index.if | 175 |
| abstract_inverted_index.in | 22, 123, 158 |
| abstract_inverted_index.is | 29, 195, 203 |
| abstract_inverted_index.of | 38, 67, 125, 128, 156, 170, 189, 200 |
| abstract_inverted_index.on | 54 |
| abstract_inverted_index.or | 31 |
| abstract_inverted_index.to | 6, 19, 186, 208, 213 |
| abstract_inverted_index.up | 185, 212 |
| abstract_inverted_index.we | 62 |
| abstract_inverted_index.Our | 106 |
| abstract_inverted_index.RL, | 43 |
| abstract_inverted_index.RL. | 77 |
| abstract_inverted_index.all | 176 |
| abstract_inverted_index.and | 89, 99, 116 |
| abstract_inverted_index.are | 179 |
| abstract_inverted_index.due | 18 |
| abstract_inverted_index.for | 41, 74 |
| abstract_inverted_index.has | 14 |
| abstract_inverted_index.its | 20 |
| abstract_inverted_index.the | 36, 68, 126, 140, 145, 150, 154, 159, 164, 171, 177, 190, 198, 209 |
| abstract_inverted_index.This | 33 |
| abstract_inverted_index.data | 27, 178 |
| abstract_inverted_index.long | 138 |
| abstract_inverted_index.only | 204 |
| abstract_inverted_index.rate | 87 |
| abstract_inverted_index.that | 169 |
| abstract_inverted_index.them | 91 |
| abstract_inverted_index.with | 84, 112, 206 |
| abstract_inverted_index.work | 34 |
| abstract_inverted_index.(RL), | 3 |
| abstract_inverted_index.cover | 144 |
| abstract_inverted_index.data, | 13 |
| abstract_inverted_index.fact, | 163 |
| abstract_inverted_index.learn | 7 |
| abstract_inverted_index.local | 80, 141 |
| abstract_inverted_index.novel | 85 |
| abstract_inverted_index.power | 155 |
| abstract_inverted_index.seeks | 5 |
| abstract_inverted_index.space | 147 |
| abstract_inverted_index.term. | 105 |
| abstract_inverted_index.terms | 124 |
| abstract_inverted_index.that, | 111 |
| abstract_inverted_index.using | 11, 96 |
| abstract_inverted_index.where | 25, 197 |
| abstract_inverted_index.which | 4 |
| abstract_inverted_index.Markov | 58 |
| abstract_inverted_index.agents | 83, 129 |
| abstract_inverted_index.aiming | 44 |
| abstract_inverted_index.almost | 167 |
| abstract_inverted_index.chosen | 114 |
| abstract_inverted_index.design | 63 |
| abstract_inverted_index.length | 211 |
| abstract_inverted_index.linear | 121, 205 |
| abstract_inverted_index.number | 127, 199 |
| abstract_inverted_index.online | 26 |
| abstract_inverted_index.policy | 10 |
| abstract_inverted_index.rounds | 202 |
| abstract_inverted_index.sample | 107, 165 |
| abstract_inverted_index.server | 95 |
| abstract_inverted_index.stored | 180 |
| abstract_inverted_index.(MDPs), | 61 |
| abstract_inverted_index.Offline | 0 |
| abstract_inverted_index.agents, | 136 |
| abstract_inverted_index.agents. | 52 |
| abstract_inverted_index.benefit | 37 |
| abstract_inverted_index.central | 94, 183 |
| abstract_inverted_index.factors | 188 |
| abstract_inverted_index.horizon | 191, 210 |
| abstract_inverted_index.length. | 192 |
| abstract_inverted_index.matches | 168 |
| abstract_inverted_index.offline | 12, 42, 48, 76 |
| abstract_inverted_index.optimal | 9, 151 |
| abstract_inverted_index.penalty | 104 |
| abstract_inverted_index.policy, | 152 |
| abstract_inverted_index.popular | 69 |
| abstract_inverted_index.respect | 207 |
| abstract_inverted_index.reveals | 110 |
| abstract_inverted_index.speedup | 122 |
| abstract_inverted_index.tabular | 57 |
| abstract_inverted_index.updates | 79 |
| abstract_inverted_index.variant | 66 |
| abstract_inverted_index.visited | 148 |
| abstract_inverted_index.without | 130 |
| abstract_inverted_index.FedLCB-Q | 78, 119, 194 |
| abstract_inverted_index.Focusing | 53 |
| abstract_inverted_index.achieves | 120 |
| abstract_inverted_index.analysis | 109 |
| abstract_inverted_index.critical | 23 |
| abstract_inverted_index.datasets | 49, 133, 142 |
| abstract_inverted_index.decision | 59 |
| abstract_inverted_index.designed | 102 |
| abstract_inverted_index.episodic | 56 |
| abstract_inverted_index.explores | 35 |
| abstract_inverted_index.factors. | 215 |
| abstract_inverted_index.garnered | 15 |
| abstract_inverted_index.interest | 17 |
| abstract_inverted_index.learning | 2, 40, 86 |
| abstract_inverted_index.multiple | 51 |
| abstract_inverted_index.setting. | 161 |
| abstract_inverted_index.tailored | 73 |
| abstract_inverted_index.FedLCB-Q, | 64 |
| abstract_inverted_index.algorithm | 72 |
| abstract_inverted_index.averaging | 98 |
| abstract_inverted_index.carefully | 101 |
| abstract_inverted_index.federated | 39, 75, 160 |
| abstract_inverted_index.location, | 184 |
| abstract_inverted_index.potential | 21 |
| abstract_inverted_index.processes | 60 |
| abstract_inverted_index.requiring | 131 |
| abstract_inverted_index.schedules | 88 |
| abstract_inverted_index.Q-learning | 71 |
| abstract_inverted_index.aggregates | 90 |
| abstract_inverted_index.collection | 28 |
| abstract_inverted_index.complexity | 108, 166 |
| abstract_inverted_index.expensive. | 32 |
| abstract_inverted_index.importance | 97 |
| abstract_inverted_index.individual | 135 |
| abstract_inverted_index.infeasible | 30 |
| abstract_inverted_index.leveraging | 47 |
| abstract_inverted_index.model-free | 70 |
| abstract_inverted_index.parameters | 115 |
| abstract_inverted_index.polynomial | 187 |
| abstract_inverted_index.schedules, | 118 |
| abstract_inverted_index.Q-functions | 81 |
| abstract_inverted_index.logarithmic | 214 |
| abstract_inverted_index.pessimistic | 103 |
| abstract_inverted_index.significant | 16 |
| abstract_inverted_index.Furthermore, | 193 |
| abstract_inverted_index.applications | 24 |
| abstract_inverted_index.collectively | 143 |
| abstract_inverted_index.counterpart, | 173 |
| abstract_inverted_index.high-quality | 132 |
| abstract_inverted_index.highlighting | 153 |
| abstract_inverted_index.single-agent | 172 |
| abstract_inverted_index.state-action | 146 |
| abstract_inverted_index.appropriately | 113 |
| abstract_inverted_index.collaboration | 157 |
| abstract_inverted_index.communication | 201 |
| abstract_inverted_index.reinforcement | 1 |
| abstract_inverted_index.finite-horizon | 55 |
| abstract_inverted_index.collaboratively | 46 |
| abstract_inverted_index.synchronization | 117 |
| abstract_inverted_index.communication-efficient, | 196 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 4 |
| citation_normalized_percentile |