Finite time analysis of temporal difference learning with linear function approximation: Tail averaging and regularisation Article Swipe
YOU?
·
· 2022
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2210.05918
We study the finite-time behaviour of the popular temporal difference (TD) learning algorithm when combined with tail-averaging. We derive finite time bounds on the parameter error of the tail-averaged TD iterate under a step-size choice that does not require information about the eigenvalues of the matrix underlying the projected TD fixed point. Our analysis shows that tail-averaged TD converges at the optimal $O\left(1/t\right)$ rate, both in expectation and with high probability. In addition, our bounds exhibit a sharper rate of decay for the initial error (bias), which is an improvement over averaging all iterates. We also propose and analyse a variant of TD that incorporates regularisation. From analysis, we conclude that the regularised version of TD is useful for problems with ill-conditioned features.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2210.05918
- https://arxiv.org/pdf/2210.05918
- OA Status
- green
- Cited By
- 2
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4306177356
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4306177356Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2210.05918Digital Object Identifier
- Title
-
Finite time analysis of temporal difference learning with linear function approximation: Tail averaging and regularisationWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2022Year of publication
- Publication date
-
2022-10-12Full publication date if available
- Authors
-
Gandharv Patil, L. A. Prashanth, Anant Raj, Doina PrecupList of authors in order
- Landing page
-
https://arxiv.org/abs/2210.05918Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2210.05918Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2210.05918Direct OA link when available
- Concepts
-
Mathematics, Iterated function, Eigenvalues and eigenvectors, Applied mathematics, Temporal difference learning, Matrix (chemical analysis), Function (biology), Rate of convergence, Mathematical analysis, Computer science, Physics, Reinforcement learning, Artificial intelligence, Materials science, Channel (broadcasting), Biology, Computer network, Quantum mechanics, Evolutionary biology, Composite materialTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
2Total citation count in OpenAlex
- Citations by year (recent)
-
2025: 1, 2023: 1Per-year citation counts (last 5 years)
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4306177356 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2210.05918 |
| ids.doi | https://doi.org/10.48550/arxiv.2210.05918 |
| ids.openalex | https://openalex.org/W4306177356 |
| fwci | 0.38991457 |
| type | preprint |
| title | Finite time analysis of temporal difference learning with linear function approximation: Tail averaging and regularisation |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T11447 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9947999715805054 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1711 |
| topics[0].subfield.display_name | Signal Processing |
| topics[0].display_name | Blind Source Separation Techniques |
| topics[1].id | https://openalex.org/T12611 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9919999837875366 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1702 |
| topics[1].subfield.display_name | Artificial Intelligence |
| topics[1].display_name | Neural Networks and Reservoir Computing |
| topics[2].id | https://openalex.org/T11233 |
| topics[2].field.id | https://openalex.org/fields/22 |
| topics[2].field.display_name | Engineering |
| topics[2].score | 0.9876999855041504 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/2206 |
| topics[2].subfield.display_name | Computational Mechanics |
| topics[2].display_name | Advanced Adaptive Filtering Techniques |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C33923547 |
| concepts[0].level | 0 |
| concepts[0].score | 0.759554922580719 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q395 |
| concepts[0].display_name | Mathematics |
| concepts[1].id | https://openalex.org/C140479938 |
| concepts[1].level | 2 |
| concepts[1].score | 0.7023898959159851 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q5254619 |
| concepts[1].display_name | Iterated function |
| concepts[2].id | https://openalex.org/C158693339 |
| concepts[2].level | 2 |
| concepts[2].score | 0.6746031641960144 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q190524 |
| concepts[2].display_name | Eigenvalues and eigenvectors |
| concepts[3].id | https://openalex.org/C28826006 |
| concepts[3].level | 1 |
| concepts[3].score | 0.5675721764564514 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q33521 |
| concepts[3].display_name | Applied mathematics |
| concepts[4].id | https://openalex.org/C196340769 |
| concepts[4].level | 3 |
| concepts[4].score | 0.4975726902484894 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q7698910 |
| concepts[4].display_name | Temporal difference learning |
| concepts[5].id | https://openalex.org/C106487976 |
| concepts[5].level | 2 |
| concepts[5].score | 0.4685564637184143 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q685816 |
| concepts[5].display_name | Matrix (chemical analysis) |
| concepts[6].id | https://openalex.org/C14036430 |
| concepts[6].level | 2 |
| concepts[6].score | 0.4484899640083313 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q3736076 |
| concepts[6].display_name | Function (biology) |
| concepts[7].id | https://openalex.org/C57869625 |
| concepts[7].level | 3 |
| concepts[7].score | 0.43921172618865967 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q1783502 |
| concepts[7].display_name | Rate of convergence |
| concepts[8].id | https://openalex.org/C134306372 |
| concepts[8].level | 1 |
| concepts[8].score | 0.28793221712112427 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q7754 |
| concepts[8].display_name | Mathematical analysis |
| concepts[9].id | https://openalex.org/C41008148 |
| concepts[9].level | 0 |
| concepts[9].score | 0.13342711329460144 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[9].display_name | Computer science |
| concepts[10].id | https://openalex.org/C121332964 |
| concepts[10].level | 0 |
| concepts[10].score | 0.11512759327888489 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q413 |
| concepts[10].display_name | Physics |
| concepts[11].id | https://openalex.org/C97541855 |
| concepts[11].level | 2 |
| concepts[11].score | 0.09060981869697571 |
| concepts[11].wikidata | https://www.wikidata.org/wiki/Q830687 |
| concepts[11].display_name | Reinforcement learning |
| concepts[12].id | https://openalex.org/C154945302 |
| concepts[12].level | 1 |
| concepts[12].score | 0.08473944664001465 |
| concepts[12].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[12].display_name | Artificial intelligence |
| concepts[13].id | https://openalex.org/C192562407 |
| concepts[13].level | 0 |
| concepts[13].score | 0.0 |
| concepts[13].wikidata | https://www.wikidata.org/wiki/Q228736 |
| concepts[13].display_name | Materials science |
| concepts[14].id | https://openalex.org/C127162648 |
| concepts[14].level | 2 |
| concepts[14].score | 0.0 |
| concepts[14].wikidata | https://www.wikidata.org/wiki/Q16858953 |
| concepts[14].display_name | Channel (broadcasting) |
| concepts[15].id | https://openalex.org/C86803240 |
| concepts[15].level | 0 |
| concepts[15].score | 0.0 |
| concepts[15].wikidata | https://www.wikidata.org/wiki/Q420 |
| concepts[15].display_name | Biology |
| concepts[16].id | https://openalex.org/C31258907 |
| concepts[16].level | 1 |
| concepts[16].score | 0.0 |
| concepts[16].wikidata | https://www.wikidata.org/wiki/Q1301371 |
| concepts[16].display_name | Computer network |
| concepts[17].id | https://openalex.org/C62520636 |
| concepts[17].level | 1 |
| concepts[17].score | 0.0 |
| concepts[17].wikidata | https://www.wikidata.org/wiki/Q944 |
| concepts[17].display_name | Quantum mechanics |
| concepts[18].id | https://openalex.org/C78458016 |
| concepts[18].level | 1 |
| concepts[18].score | 0.0 |
| concepts[18].wikidata | https://www.wikidata.org/wiki/Q840400 |
| concepts[18].display_name | Evolutionary biology |
| concepts[19].id | https://openalex.org/C159985019 |
| concepts[19].level | 1 |
| concepts[19].score | 0.0 |
| concepts[19].wikidata | https://www.wikidata.org/wiki/Q181790 |
| concepts[19].display_name | Composite material |
| keywords[0].id | https://openalex.org/keywords/mathematics |
| keywords[0].score | 0.759554922580719 |
| keywords[0].display_name | Mathematics |
| keywords[1].id | https://openalex.org/keywords/iterated-function |
| keywords[1].score | 0.7023898959159851 |
| keywords[1].display_name | Iterated function |
| keywords[2].id | https://openalex.org/keywords/eigenvalues-and-eigenvectors |
| keywords[2].score | 0.6746031641960144 |
| keywords[2].display_name | Eigenvalues and eigenvectors |
| keywords[3].id | https://openalex.org/keywords/applied-mathematics |
| keywords[3].score | 0.5675721764564514 |
| keywords[3].display_name | Applied mathematics |
| keywords[4].id | https://openalex.org/keywords/temporal-difference-learning |
| keywords[4].score | 0.4975726902484894 |
| keywords[4].display_name | Temporal difference learning |
| keywords[5].id | https://openalex.org/keywords/matrix |
| keywords[5].score | 0.4685564637184143 |
| keywords[5].display_name | Matrix (chemical analysis) |
| keywords[6].id | https://openalex.org/keywords/function |
| keywords[6].score | 0.4484899640083313 |
| keywords[6].display_name | Function (biology) |
| keywords[7].id | https://openalex.org/keywords/rate-of-convergence |
| keywords[7].score | 0.43921172618865967 |
| keywords[7].display_name | Rate of convergence |
| keywords[8].id | https://openalex.org/keywords/mathematical-analysis |
| keywords[8].score | 0.28793221712112427 |
| keywords[8].display_name | Mathematical analysis |
| keywords[9].id | https://openalex.org/keywords/computer-science |
| keywords[9].score | 0.13342711329460144 |
| keywords[9].display_name | Computer science |
| keywords[10].id | https://openalex.org/keywords/physics |
| keywords[10].score | 0.11512759327888489 |
| keywords[10].display_name | Physics |
| keywords[11].id | https://openalex.org/keywords/reinforcement-learning |
| keywords[11].score | 0.09060981869697571 |
| keywords[11].display_name | Reinforcement learning |
| keywords[12].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[12].score | 0.08473944664001465 |
| keywords[12].display_name | Artificial intelligence |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2210.05918 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2210.05918 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2210.05918 |
| locations[1].id | doi:10.48550/arxiv.2210.05918 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article-journal |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2210.05918 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5018479221 |
| authorships[0].author.orcid | |
| authorships[0].author.display_name | Gandharv Patil |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Patil, Gandharv |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5068379567 |
| authorships[1].author.orcid | |
| authorships[1].author.display_name | L. A. Prashanth |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | A., Prashanth L. |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5071444356 |
| authorships[2].author.orcid | https://orcid.org/0000-0002-0320-4733 |
| authorships[2].author.display_name | Anant Raj |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Nagaraj, Dheeraj |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5065836447 |
| authorships[3].author.orcid | |
| authorships[3].author.display_name | Doina Precup |
| authorships[3].author_position | last |
| authorships[3].raw_author_name | Precup, Doina |
| authorships[3].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2210.05918 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Finite time analysis of temporal difference learning with linear function approximation: Tail averaging and regularisation |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T11447 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9947999715805054 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1711 |
| primary_topic.subfield.display_name | Signal Processing |
| primary_topic.display_name | Blind Source Separation Techniques |
| related_works | https://openalex.org/W3103325625, https://openalex.org/W1486898455, https://openalex.org/W3131519263, https://openalex.org/W2041905724, https://openalex.org/W4311726894, https://openalex.org/W2991463004, https://openalex.org/W2088588293, https://openalex.org/W2949122357, https://openalex.org/W2473609169, https://openalex.org/W2329573185 |
| cited_by_count | 2 |
| counts_by_year[0].year | 2025 |
| counts_by_year[0].cited_by_count | 1 |
| counts_by_year[1].year | 2023 |
| counts_by_year[1].cited_by_count | 1 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2210.05918 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2210.05918 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2210.05918 |
| primary_location.id | pmh:oai:arXiv.org:2210.05918 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2210.05918 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2210.05918 |
| publication_date | 2022-10-12 |
| publication_year | 2022 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 32, 76, 99 |
| abstract_inverted_index.In | 71 |
| abstract_inverted_index.TD | 29, 49, 57, 102, 115 |
| abstract_inverted_index.We | 0, 17, 94 |
| abstract_inverted_index.an | 88 |
| abstract_inverted_index.at | 59 |
| abstract_inverted_index.in | 65 |
| abstract_inverted_index.is | 87, 116 |
| abstract_inverted_index.of | 5, 26, 43, 79, 101, 114 |
| abstract_inverted_index.on | 22 |
| abstract_inverted_index.we | 108 |
| abstract_inverted_index.Our | 52 |
| abstract_inverted_index.all | 92 |
| abstract_inverted_index.and | 67, 97 |
| abstract_inverted_index.for | 81, 118 |
| abstract_inverted_index.not | 37 |
| abstract_inverted_index.our | 73 |
| abstract_inverted_index.the | 2, 6, 23, 27, 41, 44, 47, 60, 82, 111 |
| abstract_inverted_index.(TD) | 10 |
| abstract_inverted_index.From | 106 |
| abstract_inverted_index.also | 95 |
| abstract_inverted_index.both | 64 |
| abstract_inverted_index.does | 36 |
| abstract_inverted_index.high | 69 |
| abstract_inverted_index.over | 90 |
| abstract_inverted_index.rate | 78 |
| abstract_inverted_index.that | 35, 55, 103, 110 |
| abstract_inverted_index.time | 20 |
| abstract_inverted_index.when | 13 |
| abstract_inverted_index.with | 15, 68, 120 |
| abstract_inverted_index.about | 40 |
| abstract_inverted_index.decay | 80 |
| abstract_inverted_index.error | 25, 84 |
| abstract_inverted_index.fixed | 50 |
| abstract_inverted_index.rate, | 63 |
| abstract_inverted_index.shows | 54 |
| abstract_inverted_index.study | 1 |
| abstract_inverted_index.under | 31 |
| abstract_inverted_index.which | 86 |
| abstract_inverted_index.bounds | 21, 74 |
| abstract_inverted_index.choice | 34 |
| abstract_inverted_index.derive | 18 |
| abstract_inverted_index.finite | 19 |
| abstract_inverted_index.matrix | 45 |
| abstract_inverted_index.point. | 51 |
| abstract_inverted_index.useful | 117 |
| abstract_inverted_index.(bias), | 85 |
| abstract_inverted_index.analyse | 98 |
| abstract_inverted_index.exhibit | 75 |
| abstract_inverted_index.initial | 83 |
| abstract_inverted_index.iterate | 30 |
| abstract_inverted_index.optimal | 61 |
| abstract_inverted_index.popular | 7 |
| abstract_inverted_index.propose | 96 |
| abstract_inverted_index.require | 38 |
| abstract_inverted_index.sharper | 77 |
| abstract_inverted_index.variant | 100 |
| abstract_inverted_index.version | 113 |
| abstract_inverted_index.analysis | 53 |
| abstract_inverted_index.combined | 14 |
| abstract_inverted_index.conclude | 109 |
| abstract_inverted_index.learning | 11 |
| abstract_inverted_index.problems | 119 |
| abstract_inverted_index.temporal | 8 |
| abstract_inverted_index.addition, | 72 |
| abstract_inverted_index.algorithm | 12 |
| abstract_inverted_index.analysis, | 107 |
| abstract_inverted_index.averaging | 91 |
| abstract_inverted_index.behaviour | 4 |
| abstract_inverted_index.converges | 58 |
| abstract_inverted_index.features. | 122 |
| abstract_inverted_index.iterates. | 93 |
| abstract_inverted_index.parameter | 24 |
| abstract_inverted_index.projected | 48 |
| abstract_inverted_index.step-size | 33 |
| abstract_inverted_index.difference | 9 |
| abstract_inverted_index.underlying | 46 |
| abstract_inverted_index.eigenvalues | 42 |
| abstract_inverted_index.expectation | 66 |
| abstract_inverted_index.finite-time | 3 |
| abstract_inverted_index.improvement | 89 |
| abstract_inverted_index.information | 39 |
| abstract_inverted_index.regularised | 112 |
| abstract_inverted_index.incorporates | 104 |
| abstract_inverted_index.probability. | 70 |
| abstract_inverted_index.tail-averaged | 28, 56 |
| abstract_inverted_index.ill-conditioned | 121 |
| abstract_inverted_index.regularisation. | 105 |
| abstract_inverted_index.tail-averaging. | 16 |
| abstract_inverted_index.$O\left(1/t\right)$ | 62 |
| cited_by_percentile_year.max | 95 |
| cited_by_percentile_year.min | 89 |
| countries_distinct_count | 0 |
| institutions_distinct_count | 4 |
| citation_normalized_percentile.value | 0.54177057 |
| citation_normalized_percentile.is_in_top_1_percent | False |
| citation_normalized_percentile.is_in_top_10_percent | False |