Adapting Decoder-Based Language Models for Diverse Encoder Downstream Tasks Article Swipe
YOU?
·
· 2025
· Open Access
·
Decoder-based transformers, while revolutionizing language modeling and scaling to immense sizes, have not completely overtaken encoder-heavy architectures in natural language processing. Specifically, encoder-only models remain dominant in tasks like classification, regression, and ranking. This is primarily due to the inherent structure of decoder-based models, which limits their direct applicability to these tasks. In this paper, we introduce Gemma Encoder, adapting the powerful Gemma decoder model to an encoder architecture, thereby unlocking its potential for a wider range of non-generative applications. To optimize the adaptation from decoder to encoder, we systematically analyze various pooling strategies, attention mechanisms, and hyperparameters (e.g., dropout rate). Furthermore, we benchmark Gemma Encoder against established approaches on the GLUE benchmarks, and MS MARCO ranking benchmark, demonstrating its effectiveness and versatility.
Related Topics
- Type
- article
- Language
- en
- Landing Page
- http://arxiv.org/abs/2503.02656
- https://arxiv.org/pdf/2503.02656
- OA Status
- green
- OpenAlex ID
- https://openalex.org/W4415335368
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4415335368Canonical identifier for this work in OpenAlex
- Title
-
Adapting Decoder-Based Language Models for Diverse Encoder Downstream TasksWork title
- Type
-
articleOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2025Year of publication
- Publication date
-
2025-03-04Full publication date if available
- Authors
-
Paul Suganthan, Fédor Moiseev, Limei Yan, Junru Wu, Jianmo Ni, Jay J. Han, Imed Zitouni, Enrique Alfonseca, Xuanhui Wang, Zhe DongList of authors in order
- Landing page
-
https://arxiv.org/abs/2503.02656Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2503.02656Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2503.02656Direct OA link when available
- Cited by
-
0Total citation count in OpenAlex
Full payload
| id | https://openalex.org/W4415335368 |
|---|---|
| doi | |
| ids.openalex | https://openalex.org/W4415335368 |
| fwci | 0.0 |
| type | article |
| title | Adapting Decoder-Based Language Models for Diverse Encoder Downstream Tasks |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T11902 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.8327999711036682 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1702 |
| topics[0].subfield.display_name | Artificial Intelligence |
| topics[0].display_name | Intelligent Tutoring Systems and Adaptive Learning |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2503.02656 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2503.02656 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2503.02656 |
| indexed_in | arxiv |
| authorships[0].author.id | https://openalex.org/A5104217740 |
| authorships[0].author.orcid | |
| authorships[0].author.display_name | Paul Suganthan |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Suganthan, Paul |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5055241858 |
| authorships[1].author.orcid | |
| authorships[1].author.display_name | Fédor Moiseev |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Moiseev, Fedor |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5073298244 |
| authorships[2].author.orcid | https://orcid.org/0000-0002-1402-923X |
| authorships[2].author.display_name | Limei Yan |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Yan, Le |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5029597925 |
| authorships[3].author.orcid | https://orcid.org/0009-0000-5586-6926 |
| authorships[3].author.display_name | Junru Wu |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Wu, Junru |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5077817759 |
| authorships[4].author.orcid | https://orcid.org/0000-0002-6863-8073 |
| authorships[4].author.display_name | Jianmo Ni |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Ni, Jianmo |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A5020910026 |
| authorships[5].author.orcid | https://orcid.org/0000-0001-5618-0942 |
| authorships[5].author.display_name | Jay J. Han |
| authorships[5].author_position | middle |
| authorships[5].raw_author_name | Han, Jay |
| authorships[5].is_corresponding | False |
| authorships[6].author.id | https://openalex.org/A5108365460 |
| authorships[6].author.orcid | |
| authorships[6].author.display_name | Imed Zitouni |
| authorships[6].author_position | middle |
| authorships[6].raw_author_name | Zitouni, Imed |
| authorships[6].is_corresponding | False |
| authorships[7].author.id | https://openalex.org/A5091708404 |
| authorships[7].author.orcid | |
| authorships[7].author.display_name | Enrique Alfonseca |
| authorships[7].author_position | middle |
| authorships[7].raw_author_name | Alfonseca, Enrique |
| authorships[7].is_corresponding | False |
| authorships[8].author.id | https://openalex.org/A5064608039 |
| authorships[8].author.orcid | https://orcid.org/0009-0000-1388-1423 |
| authorships[8].author.display_name | Xuanhui Wang |
| authorships[8].author_position | middle |
| authorships[8].raw_author_name | Wang, Xuanhui |
| authorships[8].is_corresponding | False |
| authorships[9].author.id | https://openalex.org/A5101842429 |
| authorships[9].author.orcid | https://orcid.org/0000-0001-8993-1386 |
| authorships[9].author.display_name | Zhe Dong |
| authorships[9].author_position | last |
| authorships[9].raw_author_name | Dong, Zhe |
| authorships[9].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2503.02656 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-19T00:00:00 |
| display_name | Adapting Decoder-Based Language Models for Diverse Encoder Downstream Tasks |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T04:12:42.849631 |
| primary_topic.id | https://openalex.org/T11902 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.8327999711036682 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1702 |
| primary_topic.subfield.display_name | Artificial Intelligence |
| primary_topic.display_name | Intelligent Tutoring Systems and Adaptive Learning |
| cited_by_count | 0 |
| locations_count | 1 |
| best_oa_location.id | pmh:oai:arXiv.org:2503.02656 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2503.02656 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2503.02656 |
| primary_location.id | pmh:oai:arXiv.org:2503.02656 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2503.02656 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2503.02656 |
| publication_date | 2025-03-04 |
| publication_year | 2025 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 74 |
| abstract_inverted_index.In | 52 |
| abstract_inverted_index.MS | 114 |
| abstract_inverted_index.To | 80 |
| abstract_inverted_index.an | 66 |
| abstract_inverted_index.in | 17, 26 |
| abstract_inverted_index.is | 34 |
| abstract_inverted_index.of | 41, 77 |
| abstract_inverted_index.on | 109 |
| abstract_inverted_index.to | 8, 37, 49, 65, 86 |
| abstract_inverted_index.we | 55, 88, 102 |
| abstract_inverted_index.and | 6, 31, 96, 113, 121 |
| abstract_inverted_index.due | 36 |
| abstract_inverted_index.for | 73 |
| abstract_inverted_index.its | 71, 119 |
| abstract_inverted_index.not | 12 |
| abstract_inverted_index.the | 38, 60, 82, 110 |
| abstract_inverted_index.GLUE | 111 |
| abstract_inverted_index.This | 33 |
| abstract_inverted_index.from | 84 |
| abstract_inverted_index.have | 11 |
| abstract_inverted_index.like | 28 |
| abstract_inverted_index.this | 53 |
| abstract_inverted_index.Gemma | 57, 62, 104 |
| abstract_inverted_index.MARCO | 115 |
| abstract_inverted_index.model | 64 |
| abstract_inverted_index.range | 76 |
| abstract_inverted_index.tasks | 27 |
| abstract_inverted_index.their | 46 |
| abstract_inverted_index.these | 50 |
| abstract_inverted_index.which | 44 |
| abstract_inverted_index.while | 2 |
| abstract_inverted_index.wider | 75 |
| abstract_inverted_index.(e.g., | 98 |
| abstract_inverted_index.direct | 47 |
| abstract_inverted_index.limits | 45 |
| abstract_inverted_index.models | 23 |
| abstract_inverted_index.paper, | 54 |
| abstract_inverted_index.rate). | 100 |
| abstract_inverted_index.remain | 24 |
| abstract_inverted_index.sizes, | 10 |
| abstract_inverted_index.tasks. | 51 |
| abstract_inverted_index.Encoder | 105 |
| abstract_inverted_index.against | 106 |
| abstract_inverted_index.analyze | 90 |
| abstract_inverted_index.decoder | 63, 85 |
| abstract_inverted_index.dropout | 99 |
| abstract_inverted_index.encoder | 67 |
| abstract_inverted_index.immense | 9 |
| abstract_inverted_index.models, | 43 |
| abstract_inverted_index.natural | 18 |
| abstract_inverted_index.pooling | 92 |
| abstract_inverted_index.ranking | 116 |
| abstract_inverted_index.scaling | 7 |
| abstract_inverted_index.thereby | 69 |
| abstract_inverted_index.various | 91 |
| abstract_inverted_index.Encoder, | 58 |
| abstract_inverted_index.adapting | 59 |
| abstract_inverted_index.dominant | 25 |
| abstract_inverted_index.encoder, | 87 |
| abstract_inverted_index.inherent | 39 |
| abstract_inverted_index.language | 4, 19 |
| abstract_inverted_index.modeling | 5 |
| abstract_inverted_index.optimize | 81 |
| abstract_inverted_index.powerful | 61 |
| abstract_inverted_index.ranking. | 32 |
| abstract_inverted_index.attention | 94 |
| abstract_inverted_index.benchmark | 103 |
| abstract_inverted_index.introduce | 56 |
| abstract_inverted_index.overtaken | 14 |
| abstract_inverted_index.potential | 72 |
| abstract_inverted_index.primarily | 35 |
| abstract_inverted_index.structure | 40 |
| abstract_inverted_index.unlocking | 70 |
| abstract_inverted_index.adaptation | 83 |
| abstract_inverted_index.approaches | 108 |
| abstract_inverted_index.benchmark, | 117 |
| abstract_inverted_index.completely | 13 |
| abstract_inverted_index.benchmarks, | 112 |
| abstract_inverted_index.established | 107 |
| abstract_inverted_index.mechanisms, | 95 |
| abstract_inverted_index.processing. | 20 |
| abstract_inverted_index.regression, | 30 |
| abstract_inverted_index.strategies, | 93 |
| abstract_inverted_index.Furthermore, | 101 |
| abstract_inverted_index.encoder-only | 22 |
| abstract_inverted_index.versatility. | 122 |
| abstract_inverted_index.Decoder-based | 0 |
| abstract_inverted_index.Specifically, | 21 |
| abstract_inverted_index.applicability | 48 |
| abstract_inverted_index.applications. | 79 |
| abstract_inverted_index.architecture, | 68 |
| abstract_inverted_index.architectures | 16 |
| abstract_inverted_index.decoder-based | 42 |
| abstract_inverted_index.demonstrating | 118 |
| abstract_inverted_index.effectiveness | 120 |
| abstract_inverted_index.encoder-heavy | 15 |
| abstract_inverted_index.transformers, | 1 |
| abstract_inverted_index.non-generative | 78 |
| abstract_inverted_index.systematically | 89 |
| abstract_inverted_index.classification, | 29 |
| abstract_inverted_index.hyperparameters | 97 |
| abstract_inverted_index.revolutionizing | 3 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 10 |
| citation_normalized_percentile.value | 0.226499 |
| citation_normalized_percentile.is_in_top_1_percent | False |
| citation_normalized_percentile.is_in_top_10_percent | True |