Layer-Specific Scaling of Positional Encodings for Superior Long-Context Modeling Article Swipe
YOU?
·
· 2025
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2503.04355
Although large language models (LLMs) have achieved significant progress in handling long-context inputs, they still suffer from the ``lost-in-the-middle'' problem, where crucial information in the middle of the context is often underrepresented or lost. Our extensive experiments reveal that this issue may arise from the rapid long-term decay in Rotary Position Embedding (RoPE). To address this problem, we propose a layer-specific positional encoding scaling method that assigns distinct scaling factors to each layer, slowing down the decay rate caused by RoPE to make the model pay more attention to the middle context. A specially designed genetic algorithm is employed to efficiently select the optimal scaling factors for each layer by incorporating Bezier curves to reduce the search space. Through comprehensive experimentation, we demonstrate that our method significantly alleviates the ``lost-in-the-middle'' problem. Our approach results in an average accuracy improvement of up to 20% on the Key-Value Retrieval dataset. Furthermore, we show that layer-specific interpolation, as opposed to uniform interpolation across all layers, enhances the model's extrapolation capabilities when combined with PI and Dynamic-NTK positional encoding schemes.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2503.04355
- https://arxiv.org/pdf/2503.04355
- OA Status
- green
- OpenAlex ID
- https://openalex.org/W4416113221
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4416113221Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2503.04355Digital Object Identifier
- Title
-
Layer-Specific Scaling of Positional Encodings for Superior Long-Context ModelingWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2025Year of publication
- Publication date
-
2025-03-06Full publication date if available
- Authors
-
Changze Lv, Zhigang Xu, Tianlong Li, Tianyuan Shi, Xiaoqing Zheng, Xuanjing HuangList of authors in order
- Landing page
-
https://arxiv.org/abs/2503.04355Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2503.04355Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2503.04355Direct OA link when available
- Cited by
-
0Total citation count in OpenAlex
Full payload
| id | https://openalex.org/W4416113221 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2503.04355 |
| ids.doi | https://doi.org/10.48550/arxiv.2503.04355 |
| ids.openalex | https://openalex.org/W4416113221 |
| fwci | |
| type | preprint |
| title | Layer-Specific Scaling of Positional Encodings for Superior Long-Context Modeling |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2503.04355 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2503.04355 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2503.04355 |
| locations[1].id | doi:10.48550/arxiv.2503.04355 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2503.04355 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5113112983 |
| authorships[0].author.orcid | |
| authorships[0].author.display_name | Changze Lv |
| authorships[0].author_position | middle |
| authorships[0].raw_author_name | Lv, Changze |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5075545680 |
| authorships[1].author.orcid | https://orcid.org/0000-0003-4477-7391 |
| authorships[1].author.display_name | Zhigang Xu |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Xu, Zhibo |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5020841033 |
| authorships[2].author.orcid | https://orcid.org/0000-0002-9483-457X |
| authorships[2].author.display_name | Tianlong Li |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Li, Tianlong |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5101107477 |
| authorships[3].author.orcid | |
| authorships[3].author.display_name | Tianyuan Shi |
| authorships[3].author_position | last |
| authorships[3].raw_author_name | Shi, Tianyuan |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5101002645 |
| authorships[4].author.orcid | |
| authorships[4].author.display_name | Xiaoqing Zheng |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Zheng, Xiaoqing |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A5088834359 |
| authorships[5].author.orcid | https://orcid.org/0000-0001-9197-9426 |
| authorships[5].author.display_name | Xuanjing Huang |
| authorships[5].author_position | middle |
| authorships[5].raw_author_name | Huang, Xuanjing |
| authorships[5].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2503.04355 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Layer-Specific Scaling of Positional Encodings for Superior Long-Context Modeling |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-28T05:25:28.570994 |
| primary_topic | |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2503.04355 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2503.04355 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2503.04355 |
| primary_location.id | pmh:oai:arXiv.org:2503.04355 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2503.04355 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2503.04355 |
| publication_date | 2025-03-06 |
| publication_year | 2025 |
| referenced_works_count | 0 |
| abstract_inverted_index.A | 92 |
| abstract_inverted_index.a | 59 |
| abstract_inverted_index.PI | 170 |
| abstract_inverted_index.To | 53 |
| abstract_inverted_index.an | 135 |
| abstract_inverted_index.as | 154 |
| abstract_inverted_index.by | 79, 109 |
| abstract_inverted_index.in | 9, 23, 48, 134 |
| abstract_inverted_index.is | 29, 97 |
| abstract_inverted_index.of | 26, 139 |
| abstract_inverted_index.on | 143 |
| abstract_inverted_index.or | 32 |
| abstract_inverted_index.to | 70, 81, 88, 99, 113, 141, 156 |
| abstract_inverted_index.up | 140 |
| abstract_inverted_index.we | 57, 121, 149 |
| abstract_inverted_index.20% | 142 |
| abstract_inverted_index.Our | 34, 131 |
| abstract_inverted_index.all | 160 |
| abstract_inverted_index.and | 171 |
| abstract_inverted_index.for | 106 |
| abstract_inverted_index.may | 41 |
| abstract_inverted_index.our | 124 |
| abstract_inverted_index.pay | 85 |
| abstract_inverted_index.the | 17, 24, 27, 44, 75, 83, 89, 102, 115, 128, 144, 163 |
| abstract_inverted_index.RoPE | 80 |
| abstract_inverted_index.down | 74 |
| abstract_inverted_index.each | 71, 107 |
| abstract_inverted_index.from | 16, 43 |
| abstract_inverted_index.have | 5 |
| abstract_inverted_index.make | 82 |
| abstract_inverted_index.more | 86 |
| abstract_inverted_index.rate | 77 |
| abstract_inverted_index.show | 150 |
| abstract_inverted_index.that | 38, 65, 123, 151 |
| abstract_inverted_index.they | 13 |
| abstract_inverted_index.this | 39, 55 |
| abstract_inverted_index.when | 167 |
| abstract_inverted_index.with | 169 |
| abstract_inverted_index.arise | 42 |
| abstract_inverted_index.decay | 47, 76 |
| abstract_inverted_index.issue | 40 |
| abstract_inverted_index.large | 1 |
| abstract_inverted_index.layer | 108 |
| abstract_inverted_index.lost. | 33 |
| abstract_inverted_index.model | 84 |
| abstract_inverted_index.often | 30 |
| abstract_inverted_index.rapid | 45 |
| abstract_inverted_index.still | 14 |
| abstract_inverted_index.where | 20 |
| abstract_inverted_index.(LLMs) | 4 |
| abstract_inverted_index.Bezier | 111 |
| abstract_inverted_index.Rotary | 49 |
| abstract_inverted_index.across | 159 |
| abstract_inverted_index.caused | 78 |
| abstract_inverted_index.curves | 112 |
| abstract_inverted_index.layer, | 72 |
| abstract_inverted_index.method | 64, 125 |
| abstract_inverted_index.middle | 25, 90 |
| abstract_inverted_index.models | 3 |
| abstract_inverted_index.reduce | 114 |
| abstract_inverted_index.reveal | 37 |
| abstract_inverted_index.search | 116 |
| abstract_inverted_index.select | 101 |
| abstract_inverted_index.space. | 117 |
| abstract_inverted_index.suffer | 15 |
| abstract_inverted_index.(RoPE). | 52 |
| abstract_inverted_index.Through | 118 |
| abstract_inverted_index.address | 54 |
| abstract_inverted_index.assigns | 66 |
| abstract_inverted_index.average | 136 |
| abstract_inverted_index.context | 28 |
| abstract_inverted_index.crucial | 21 |
| abstract_inverted_index.factors | 69, 105 |
| abstract_inverted_index.genetic | 95 |
| abstract_inverted_index.inputs, | 12 |
| abstract_inverted_index.layers, | 161 |
| abstract_inverted_index.model's | 164 |
| abstract_inverted_index.opposed | 155 |
| abstract_inverted_index.optimal | 103 |
| abstract_inverted_index.propose | 58 |
| abstract_inverted_index.results | 133 |
| abstract_inverted_index.scaling | 63, 68, 104 |
| abstract_inverted_index.slowing | 73 |
| abstract_inverted_index.uniform | 157 |
| abstract_inverted_index.Although | 0 |
| abstract_inverted_index.Position | 50 |
| abstract_inverted_index.accuracy | 137 |
| abstract_inverted_index.achieved | 6 |
| abstract_inverted_index.approach | 132 |
| abstract_inverted_index.combined | 168 |
| abstract_inverted_index.context. | 91 |
| abstract_inverted_index.dataset. | 147 |
| abstract_inverted_index.designed | 94 |
| abstract_inverted_index.distinct | 67 |
| abstract_inverted_index.employed | 98 |
| abstract_inverted_index.encoding | 62, 174 |
| abstract_inverted_index.enhances | 162 |
| abstract_inverted_index.handling | 10 |
| abstract_inverted_index.language | 2 |
| abstract_inverted_index.problem, | 19, 56 |
| abstract_inverted_index.problem. | 130 |
| abstract_inverted_index.progress | 8 |
| abstract_inverted_index.schemes. | 175 |
| abstract_inverted_index.Embedding | 51 |
| abstract_inverted_index.Key-Value | 145 |
| abstract_inverted_index.Retrieval | 146 |
| abstract_inverted_index.algorithm | 96 |
| abstract_inverted_index.attention | 87 |
| abstract_inverted_index.extensive | 35 |
| abstract_inverted_index.long-term | 46 |
| abstract_inverted_index.specially | 93 |
| abstract_inverted_index.alleviates | 127 |
| abstract_inverted_index.positional | 61, 173 |
| abstract_inverted_index.Dynamic-NTK | 172 |
| abstract_inverted_index.demonstrate | 122 |
| abstract_inverted_index.efficiently | 100 |
| abstract_inverted_index.experiments | 36 |
| abstract_inverted_index.improvement | 138 |
| abstract_inverted_index.information | 22 |
| abstract_inverted_index.significant | 7 |
| abstract_inverted_index.Furthermore, | 148 |
| abstract_inverted_index.capabilities | 166 |
| abstract_inverted_index.long-context | 11 |
| abstract_inverted_index.comprehensive | 119 |
| abstract_inverted_index.extrapolation | 165 |
| abstract_inverted_index.incorporating | 110 |
| abstract_inverted_index.interpolation | 158 |
| abstract_inverted_index.significantly | 126 |
| abstract_inverted_index.interpolation, | 153 |
| abstract_inverted_index.layer-specific | 60, 152 |
| abstract_inverted_index.experimentation, | 120 |
| abstract_inverted_index.underrepresented | 31 |
| abstract_inverted_index.``lost-in-the-middle'' | 18, 129 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 6 |
| citation_normalized_percentile |