Transformers Meet In-Context Learning: A Universal Approximation Theory Article Swipe
YOU?
·
· 2025
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2506.05200
Large language models are capable of in-context learning, the ability to perform new tasks at test time using a handful of input-output examples, without parameter updates. We develop a universal approximation theory to elucidate how transformers enable in-context learning. For a general class of functions (each representing a distinct task), we demonstrate how to construct a transformer that, without any further weight updates, can predict based on a few noisy in-context examples with vanishingly small risk. Unlike prior work that frames transformers as approximators of optimization algorithms (e.g., gradient descent) for statistical learning tasks, we integrate Barron's universal function approximation theory with the algorithm approximator viewpoint. Our approach yields approximation guarantees that are not constrained by the effectiveness of the optimization algorithms being mimicked, extending far beyond convex problems like linear regression. The key is to show that (i) any target function can be nearly linearly represented, with small $\ell_1$-norm, over a set of universal features, and (ii) a transformer can be constructed to find the linear representation -- akin to solving Lasso -- at test time.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2506.05200
- https://arxiv.org/pdf/2506.05200
- OA Status
- green
- OpenAlex ID
- https://openalex.org/W4416138639
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4416138639Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2506.05200Digital Object Identifier
- Title
-
Transformers Meet In-Context Learning: A Universal Approximation TheoryWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2025Year of publication
- Publication date
-
2025-06-05Full publication date if available
- Authors
-
Yang Jiao, Yuting Wei, Yuxin ChenList of authors in order
- Landing page
-
https://arxiv.org/abs/2506.05200Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2506.05200Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2506.05200Direct OA link when available
- Cited by
-
0Total citation count in OpenAlex
Full payload
| id | https://openalex.org/W4416138639 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2506.05200 |
| ids.doi | https://doi.org/10.48550/arxiv.2506.05200 |
| ids.openalex | https://openalex.org/W4416138639 |
| fwci | |
| type | preprint |
| title | Transformers Meet In-Context Learning: A Universal Approximation Theory |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2506.05200 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2506.05200 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2506.05200 |
| locations[1].id | doi:10.48550/arxiv.2506.05200 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2506.05200 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5100604776 |
| authorships[0].author.orcid | https://orcid.org/0000-0001-9582-1650 |
| authorships[0].author.display_name | Yang Jiao |
| authorships[0].author_position | middle |
| authorships[0].raw_author_name | Jiao, Yuchen |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5005015806 |
| authorships[1].author.orcid | https://orcid.org/0000-0003-1488-4647 |
| authorships[1].author.display_name | Yuting Wei |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Wei, Yuting |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5060273231 |
| authorships[2].author.orcid | |
| authorships[2].author.display_name | Yuxin Chen |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Chen, Yuxin |
| authorships[2].is_corresponding | False |
| has_content.pdf | True |
| has_content.grobid_xml | True |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2506.05200 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Transformers Meet In-Context Learning: A Universal Approximation Theory |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-28T06:08:08.221904 |
| primary_topic | |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2506.05200 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2506.05200 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2506.05200 |
| primary_location.id | pmh:oai:arXiv.org:2506.05200 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2506.05200 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2506.05200 |
| publication_date | 2025-06-05 |
| publication_year | 2025 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 18, 28, 40, 47, 55, 67, 151, 158 |
| abstract_inverted_index.-- | 168, 173 |
| abstract_inverted_index.We | 26 |
| abstract_inverted_index.as | 82 |
| abstract_inverted_index.at | 14, 174 |
| abstract_inverted_index.be | 143, 161 |
| abstract_inverted_index.by | 115 |
| abstract_inverted_index.is | 134 |
| abstract_inverted_index.of | 5, 20, 43, 84, 118, 153 |
| abstract_inverted_index.on | 66 |
| abstract_inverted_index.to | 10, 32, 53, 135, 163, 170 |
| abstract_inverted_index.we | 50, 94 |
| abstract_inverted_index.(i) | 138 |
| abstract_inverted_index.For | 39 |
| abstract_inverted_index.Our | 106 |
| abstract_inverted_index.The | 132 |
| abstract_inverted_index.and | 156 |
| abstract_inverted_index.any | 59, 139 |
| abstract_inverted_index.are | 3, 112 |
| abstract_inverted_index.can | 63, 142, 160 |
| abstract_inverted_index.far | 125 |
| abstract_inverted_index.few | 68 |
| abstract_inverted_index.for | 90 |
| abstract_inverted_index.how | 34, 52 |
| abstract_inverted_index.key | 133 |
| abstract_inverted_index.new | 12 |
| abstract_inverted_index.not | 113 |
| abstract_inverted_index.set | 152 |
| abstract_inverted_index.the | 8, 102, 116, 119, 165 |
| abstract_inverted_index.(ii) | 157 |
| abstract_inverted_index.akin | 169 |
| abstract_inverted_index.find | 164 |
| abstract_inverted_index.like | 129 |
| abstract_inverted_index.over | 150 |
| abstract_inverted_index.show | 136 |
| abstract_inverted_index.test | 15, 175 |
| abstract_inverted_index.that | 79, 111, 137 |
| abstract_inverted_index.time | 16 |
| abstract_inverted_index.with | 72, 101, 147 |
| abstract_inverted_index.work | 78 |
| abstract_inverted_index.(each | 45 |
| abstract_inverted_index.Large | 0 |
| abstract_inverted_index.Lasso | 172 |
| abstract_inverted_index.based | 65 |
| abstract_inverted_index.being | 122 |
| abstract_inverted_index.class | 42 |
| abstract_inverted_index.noisy | 69 |
| abstract_inverted_index.prior | 77 |
| abstract_inverted_index.risk. | 75 |
| abstract_inverted_index.small | 74, 148 |
| abstract_inverted_index.tasks | 13 |
| abstract_inverted_index.that, | 57 |
| abstract_inverted_index.time. | 176 |
| abstract_inverted_index.using | 17 |
| abstract_inverted_index.(e.g., | 87 |
| abstract_inverted_index.Unlike | 76 |
| abstract_inverted_index.beyond | 126 |
| abstract_inverted_index.convex | 127 |
| abstract_inverted_index.enable | 36 |
| abstract_inverted_index.frames | 80 |
| abstract_inverted_index.linear | 130, 166 |
| abstract_inverted_index.models | 2 |
| abstract_inverted_index.nearly | 144 |
| abstract_inverted_index.target | 140 |
| abstract_inverted_index.task), | 49 |
| abstract_inverted_index.tasks, | 93 |
| abstract_inverted_index.theory | 31, 100 |
| abstract_inverted_index.weight | 61 |
| abstract_inverted_index.yields | 108 |
| abstract_inverted_index.ability | 9 |
| abstract_inverted_index.capable | 4 |
| abstract_inverted_index.develop | 27 |
| abstract_inverted_index.further | 60 |
| abstract_inverted_index.general | 41 |
| abstract_inverted_index.handful | 19 |
| abstract_inverted_index.perform | 11 |
| abstract_inverted_index.predict | 64 |
| abstract_inverted_index.solving | 171 |
| abstract_inverted_index.without | 23, 58 |
| abstract_inverted_index.Barron's | 96 |
| abstract_inverted_index.approach | 107 |
| abstract_inverted_index.descent) | 89 |
| abstract_inverted_index.distinct | 48 |
| abstract_inverted_index.examples | 71 |
| abstract_inverted_index.function | 98, 141 |
| abstract_inverted_index.gradient | 88 |
| abstract_inverted_index.language | 1 |
| abstract_inverted_index.learning | 92 |
| abstract_inverted_index.linearly | 145 |
| abstract_inverted_index.problems | 128 |
| abstract_inverted_index.updates, | 62 |
| abstract_inverted_index.updates. | 25 |
| abstract_inverted_index.algorithm | 103 |
| abstract_inverted_index.construct | 54 |
| abstract_inverted_index.elucidate | 33 |
| abstract_inverted_index.examples, | 22 |
| abstract_inverted_index.extending | 124 |
| abstract_inverted_index.features, | 155 |
| abstract_inverted_index.functions | 44 |
| abstract_inverted_index.integrate | 95 |
| abstract_inverted_index.learning, | 7 |
| abstract_inverted_index.learning. | 38 |
| abstract_inverted_index.mimicked, | 123 |
| abstract_inverted_index.parameter | 24 |
| abstract_inverted_index.universal | 29, 97, 154 |
| abstract_inverted_index.algorithms | 86, 121 |
| abstract_inverted_index.guarantees | 110 |
| abstract_inverted_index.in-context | 6, 37, 70 |
| abstract_inverted_index.viewpoint. | 105 |
| abstract_inverted_index.constrained | 114 |
| abstract_inverted_index.constructed | 162 |
| abstract_inverted_index.demonstrate | 51 |
| abstract_inverted_index.regression. | 131 |
| abstract_inverted_index.statistical | 91 |
| abstract_inverted_index.transformer | 56, 159 |
| abstract_inverted_index.vanishingly | 73 |
| abstract_inverted_index.approximator | 104 |
| abstract_inverted_index.input-output | 21 |
| abstract_inverted_index.optimization | 85, 120 |
| abstract_inverted_index.represented, | 146 |
| abstract_inverted_index.representing | 46 |
| abstract_inverted_index.transformers | 35, 81 |
| abstract_inverted_index.approximation | 30, 99, 109 |
| abstract_inverted_index.approximators | 83 |
| abstract_inverted_index.effectiveness | 117 |
| abstract_inverted_index.$\ell_1$-norm, | 149 |
| abstract_inverted_index.representation | 167 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 3 |
| citation_normalized_percentile |