TWICE: What Advantages Can Low-Resource Domain-Specific Embedding Model Bring? -- A Case Study on Korea Financial Texts Article Swipe
YOU?
·
· 2025
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2502.07131
Domain specificity of embedding models is critical for effective performance. However, existing benchmarks, such as FinMTEB, are primarily designed for high-resource languages, leaving low-resource settings, such as Korean, under-explored. Directly translating established English benchmarks often fails to capture the linguistic and cultural nuances present in low-resource domains. In this paper, titled TWICE: What Advantages Can Low-Resource Domain-Specific Embedding Models Bring? A Case Study on Korea Financial Texts, we introduce KorFinMTEB, a novel benchmark for the Korean financial domain, specifically tailored to reflect its unique cultural characteristics in low-resource languages. Our experimental results reveal that while the models perform robustly on a translated version of FinMTEB, their performance on KorFinMTEB uncovers subtle yet critical discrepancies, especially in tasks requiring deeper semantic understanding, that underscore the limitations of direct translation. This discrepancy highlights the necessity of benchmarks that incorporate language-specific idiosyncrasies and cultural nuances. The insights from our study advocate for the development of domain-specific evaluation frameworks that can more accurately assess and drive the progress of embedding models in low-resource settings.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2502.07131
- https://arxiv.org/pdf/2502.07131
- OA Status
- green
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4407424101
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4407424101Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2502.07131Digital Object Identifier
- Title
-
TWICE: What Advantages Can Low-Resource Domain-Specific Embedding Model Bring? -- A Case Study on Korea Financial TextsWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2025Year of publication
- Publication date
-
2025-02-10Full publication date if available
- Authors
-
Y. S. Hwang, Sung Jun Jung, Hanwool Lee, Shui YuList of authors in order
- Landing page
-
https://arxiv.org/abs/2502.07131Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2502.07131Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2502.07131Direct OA link when available
- Concepts
-
Embedding, Business, Domain (mathematical analysis), Resource (disambiguation), Finance, Computer science, Artificial intelligence, Mathematics, Mathematical analysis, Computer networkTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4407424101 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2502.07131 |
| ids.doi | https://doi.org/10.48550/arxiv.2502.07131 |
| ids.openalex | https://openalex.org/W4407424101 |
| fwci | |
| type | preprint |
| title | TWICE: What Advantages Can Low-Resource Domain-Specific Embedding Model Bring? -- A Case Study on Korea Financial Texts |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T14419 |
| topics[0].field.id | https://openalex.org/fields/14 |
| topics[0].field.display_name | Business, Management and Accounting |
| topics[0].score | 0.6665999889373779 |
| topics[0].domain.id | https://openalex.org/domains/2 |
| topics[0].domain.display_name | Social Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1402 |
| topics[0].subfield.display_name | Accounting |
| topics[0].display_name | Banking Systems and Strategies |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C41608201 |
| concepts[0].level | 2 |
| concepts[0].score | 0.6126416921615601 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q980509 |
| concepts[0].display_name | Embedding |
| concepts[1].id | https://openalex.org/C144133560 |
| concepts[1].level | 0 |
| concepts[1].score | 0.5170627236366272 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q4830453 |
| concepts[1].display_name | Business |
| concepts[2].id | https://openalex.org/C36503486 |
| concepts[2].level | 2 |
| concepts[2].score | 0.48625099658966064 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q11235244 |
| concepts[2].display_name | Domain (mathematical analysis) |
| concepts[3].id | https://openalex.org/C206345919 |
| concepts[3].level | 2 |
| concepts[3].score | 0.4177986979484558 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q20380951 |
| concepts[3].display_name | Resource (disambiguation) |
| concepts[4].id | https://openalex.org/C10138342 |
| concepts[4].level | 1 |
| concepts[4].score | 0.3774470090866089 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q43015 |
| concepts[4].display_name | Finance |
| concepts[5].id | https://openalex.org/C41008148 |
| concepts[5].level | 0 |
| concepts[5].score | 0.3443799316883087 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[5].display_name | Computer science |
| concepts[6].id | https://openalex.org/C154945302 |
| concepts[6].level | 1 |
| concepts[6].score | 0.14510813355445862 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[6].display_name | Artificial intelligence |
| concepts[7].id | https://openalex.org/C33923547 |
| concepts[7].level | 0 |
| concepts[7].score | 0.11742493510246277 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q395 |
| concepts[7].display_name | Mathematics |
| concepts[8].id | https://openalex.org/C134306372 |
| concepts[8].level | 1 |
| concepts[8].score | 0.0 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q7754 |
| concepts[8].display_name | Mathematical analysis |
| concepts[9].id | https://openalex.org/C31258907 |
| concepts[9].level | 1 |
| concepts[9].score | 0.0 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q1301371 |
| concepts[9].display_name | Computer network |
| keywords[0].id | https://openalex.org/keywords/embedding |
| keywords[0].score | 0.6126416921615601 |
| keywords[0].display_name | Embedding |
| keywords[1].id | https://openalex.org/keywords/business |
| keywords[1].score | 0.5170627236366272 |
| keywords[1].display_name | Business |
| keywords[2].id | https://openalex.org/keywords/domain |
| keywords[2].score | 0.48625099658966064 |
| keywords[2].display_name | Domain (mathematical analysis) |
| keywords[3].id | https://openalex.org/keywords/resource |
| keywords[3].score | 0.4177986979484558 |
| keywords[3].display_name | Resource (disambiguation) |
| keywords[4].id | https://openalex.org/keywords/finance |
| keywords[4].score | 0.3774470090866089 |
| keywords[4].display_name | Finance |
| keywords[5].id | https://openalex.org/keywords/computer-science |
| keywords[5].score | 0.3443799316883087 |
| keywords[5].display_name | Computer science |
| keywords[6].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[6].score | 0.14510813355445862 |
| keywords[6].display_name | Artificial intelligence |
| keywords[7].id | https://openalex.org/keywords/mathematics |
| keywords[7].score | 0.11742493510246277 |
| keywords[7].display_name | Mathematics |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2502.07131 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2502.07131 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2502.07131 |
| locations[1].id | doi:10.48550/arxiv.2502.07131 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2502.07131 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5073016636 |
| authorships[0].author.orcid | https://orcid.org/0000-0002-4010-9241 |
| authorships[0].author.display_name | Y. S. Hwang |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Hwang, Yewon |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5041517590 |
| authorships[1].author.orcid | https://orcid.org/0000-0002-1051-6495 |
| authorships[1].author.display_name | Sung Jun Jung |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Jung, Sungbum |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5103110871 |
| authorships[2].author.orcid | https://orcid.org/0000-0001-5565-5184 |
| authorships[2].author.display_name | Hanwool Lee |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Lee, Hanwool |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5005228053 |
| authorships[3].author.orcid | https://orcid.org/0000-0003-4485-6743 |
| authorships[3].author.display_name | Shui Yu |
| authorships[3].author_position | last |
| authorships[3].raw_author_name | Yu, Sara |
| authorships[3].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2502.07131 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | TWICE: What Advantages Can Low-Resource Domain-Specific Embedding Model Bring? -- A Case Study on Korea Financial Texts |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T14419 |
| primary_topic.field.id | https://openalex.org/fields/14 |
| primary_topic.field.display_name | Business, Management and Accounting |
| primary_topic.score | 0.6665999889373779 |
| primary_topic.domain.id | https://openalex.org/domains/2 |
| primary_topic.domain.display_name | Social Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1402 |
| primary_topic.subfield.display_name | Accounting |
| primary_topic.display_name | Banking Systems and Strategies |
| related_works | https://openalex.org/W4391375266, https://openalex.org/W2081900870, https://openalex.org/W2037549926, https://openalex.org/W2345479200, https://openalex.org/W2183306018, https://openalex.org/W2849310602, https://openalex.org/W3006008237, https://openalex.org/W2419146053, https://openalex.org/W4388890789, https://openalex.org/W2088247287 |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2502.07131 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2502.07131 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2502.07131 |
| primary_location.id | pmh:oai:arXiv.org:2502.07131 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2502.07131 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2502.07131 |
| publication_date | 2025-02-10 |
| publication_year | 2025 |
| referenced_works_count | 0 |
| abstract_inverted_index.A | 60 |
| abstract_inverted_index.a | 70, 100 |
| abstract_inverted_index.In | 47 |
| abstract_inverted_index.as | 14, 26 |
| abstract_inverted_index.in | 44, 86, 115, 167 |
| abstract_inverted_index.is | 5 |
| abstract_inverted_index.of | 2, 103, 125, 133, 151, 164 |
| abstract_inverted_index.on | 63, 99, 107 |
| abstract_inverted_index.to | 36, 80 |
| abstract_inverted_index.we | 67 |
| abstract_inverted_index.Can | 54 |
| abstract_inverted_index.Our | 89 |
| abstract_inverted_index.The | 142 |
| abstract_inverted_index.and | 40, 139, 160 |
| abstract_inverted_index.are | 16 |
| abstract_inverted_index.can | 156 |
| abstract_inverted_index.for | 7, 19, 73, 148 |
| abstract_inverted_index.its | 82 |
| abstract_inverted_index.our | 145 |
| abstract_inverted_index.the | 38, 74, 95, 123, 131, 149, 162 |
| abstract_inverted_index.yet | 111 |
| abstract_inverted_index.Case | 61 |
| abstract_inverted_index.This | 128 |
| abstract_inverted_index.What | 52 |
| abstract_inverted_index.from | 144 |
| abstract_inverted_index.more | 157 |
| abstract_inverted_index.such | 13, 25 |
| abstract_inverted_index.that | 93, 121, 135, 155 |
| abstract_inverted_index.this | 48 |
| abstract_inverted_index.Korea | 64 |
| abstract_inverted_index.Study | 62 |
| abstract_inverted_index.drive | 161 |
| abstract_inverted_index.fails | 35 |
| abstract_inverted_index.novel | 71 |
| abstract_inverted_index.often | 34 |
| abstract_inverted_index.study | 146 |
| abstract_inverted_index.tasks | 116 |
| abstract_inverted_index.their | 105 |
| abstract_inverted_index.while | 94 |
| abstract_inverted_index.Bring? | 59 |
| abstract_inverted_index.Domain | 0 |
| abstract_inverted_index.Korean | 75 |
| abstract_inverted_index.Models | 58 |
| abstract_inverted_index.TWICE: | 51 |
| abstract_inverted_index.Texts, | 66 |
| abstract_inverted_index.assess | 159 |
| abstract_inverted_index.deeper | 118 |
| abstract_inverted_index.direct | 126 |
| abstract_inverted_index.models | 4, 96, 166 |
| abstract_inverted_index.paper, | 49 |
| abstract_inverted_index.reveal | 92 |
| abstract_inverted_index.subtle | 110 |
| abstract_inverted_index.titled | 50 |
| abstract_inverted_index.unique | 83 |
| abstract_inverted_index.English | 32 |
| abstract_inverted_index.Korean, | 27 |
| abstract_inverted_index.capture | 37 |
| abstract_inverted_index.domain, | 77 |
| abstract_inverted_index.leaving | 22 |
| abstract_inverted_index.nuances | 42 |
| abstract_inverted_index.perform | 97 |
| abstract_inverted_index.present | 43 |
| abstract_inverted_index.reflect | 81 |
| abstract_inverted_index.results | 91 |
| abstract_inverted_index.version | 102 |
| abstract_inverted_index.Directly | 29 |
| abstract_inverted_index.FinMTEB, | 15, 104 |
| abstract_inverted_index.However, | 10 |
| abstract_inverted_index.advocate | 147 |
| abstract_inverted_index.critical | 6, 112 |
| abstract_inverted_index.cultural | 41, 84, 140 |
| abstract_inverted_index.designed | 18 |
| abstract_inverted_index.domains. | 46 |
| abstract_inverted_index.existing | 11 |
| abstract_inverted_index.insights | 143 |
| abstract_inverted_index.nuances. | 141 |
| abstract_inverted_index.progress | 163 |
| abstract_inverted_index.robustly | 98 |
| abstract_inverted_index.semantic | 119 |
| abstract_inverted_index.tailored | 79 |
| abstract_inverted_index.uncovers | 109 |
| abstract_inverted_index.Embedding | 57 |
| abstract_inverted_index.Financial | 65 |
| abstract_inverted_index.benchmark | 72 |
| abstract_inverted_index.effective | 8 |
| abstract_inverted_index.embedding | 3, 165 |
| abstract_inverted_index.financial | 76 |
| abstract_inverted_index.introduce | 68 |
| abstract_inverted_index.necessity | 132 |
| abstract_inverted_index.primarily | 17 |
| abstract_inverted_index.requiring | 117 |
| abstract_inverted_index.settings, | 24 |
| abstract_inverted_index.settings. | 169 |
| abstract_inverted_index.Advantages | 53 |
| abstract_inverted_index.KorFinMTEB | 108 |
| abstract_inverted_index.accurately | 158 |
| abstract_inverted_index.benchmarks | 33, 134 |
| abstract_inverted_index.especially | 114 |
| abstract_inverted_index.evaluation | 153 |
| abstract_inverted_index.frameworks | 154 |
| abstract_inverted_index.highlights | 130 |
| abstract_inverted_index.languages, | 21 |
| abstract_inverted_index.languages. | 88 |
| abstract_inverted_index.linguistic | 39 |
| abstract_inverted_index.translated | 101 |
| abstract_inverted_index.underscore | 122 |
| abstract_inverted_index.KorFinMTEB, | 69 |
| abstract_inverted_index.benchmarks, | 12 |
| abstract_inverted_index.development | 150 |
| abstract_inverted_index.discrepancy | 129 |
| abstract_inverted_index.established | 31 |
| abstract_inverted_index.incorporate | 136 |
| abstract_inverted_index.limitations | 124 |
| abstract_inverted_index.performance | 106 |
| abstract_inverted_index.specificity | 1 |
| abstract_inverted_index.translating | 30 |
| abstract_inverted_index.Low-Resource | 55 |
| abstract_inverted_index.experimental | 90 |
| abstract_inverted_index.low-resource | 23, 45, 87, 168 |
| abstract_inverted_index.performance. | 9 |
| abstract_inverted_index.specifically | 78 |
| abstract_inverted_index.translation. | 127 |
| abstract_inverted_index.high-resource | 20 |
| abstract_inverted_index.discrepancies, | 113 |
| abstract_inverted_index.idiosyncrasies | 138 |
| abstract_inverted_index.understanding, | 120 |
| abstract_inverted_index.Domain-Specific | 56 |
| abstract_inverted_index.characteristics | 85 |
| abstract_inverted_index.domain-specific | 152 |
| abstract_inverted_index.under-explored. | 28 |
| abstract_inverted_index.language-specific | 137 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 4 |
| citation_normalized_percentile |