Speech Translation Refinement using Large Language Models Article Swipe
YOU?
·
· 2025
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2501.15090
Recent advancements in large language models (LLMs) have demonstrated their remarkable capabilities across various language tasks. Inspired by the success of text-to-text translation refinement, this paper investigates how LLMs can improve the performance of speech translation by introducing a joint refinement process. Through the joint refinement of speech translation (ST) and automatic speech recognition (ASR) transcription via LLMs, the performance of the ST model is significantly improved in both training-free in-context learning and parameter-efficient fine-tuning scenarios. Additionally, we explore the effect of document-level context on refinement under the context-aware fine-tuning scenario. Experimental results on the MuST-C and CoVoST 2 datasets, which include seven translation tasks, demonstrate the effectiveness of the proposed approach using several popular LLMs including GPT-3.5-turbo, LLaMA3-8B, and Mistral-12B. Further analysis further suggests that jointly refining both transcription and translation yields better performance compared to refining translation alone. Meanwhile, incorporating document-level context significantly enhances refinement performance. We release our code and datasets on GitHub.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2501.15090
- https://arxiv.org/pdf/2501.15090
- OA Status
- green
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4406880067
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4406880067Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2501.15090Digital Object Identifier
- Title
-
Speech Translation Refinement using Large Language ModelsWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2025Year of publication
- Publication date
-
2025-01-25Full publication date if available
- Authors
-
Henri Dou, Xinyu Tian, Xinglin Lyu, Jie Zhu, Junhui Li, Lifan GuoList of authors in order
- Landing page
-
https://arxiv.org/abs/2501.15090Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2501.15090Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2501.15090Direct OA link when available
- Concepts
-
Translation (biology), Computer science, Speech translation, Natural language processing, Speech recognition, Artificial intelligence, Linguistics, Machine translation, Philosophy, Chemistry, Messenger RNA, Gene, BiochemistryTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4406880067 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2501.15090 |
| ids.doi | https://doi.org/10.48550/arxiv.2501.15090 |
| ids.openalex | https://openalex.org/W4406880067 |
| fwci | |
| type | preprint |
| title | Speech Translation Refinement using Large Language Models |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10181 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9466000199317932 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1702 |
| topics[0].subfield.display_name | Artificial Intelligence |
| topics[0].display_name | Natural Language Processing Techniques |
| topics[1].id | https://openalex.org/T10201 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9394000172615051 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1702 |
| topics[1].subfield.display_name | Artificial Intelligence |
| topics[1].display_name | Speech Recognition and Synthesis |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C149364088 |
| concepts[0].level | 4 |
| concepts[0].score | 0.7173244953155518 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q185917 |
| concepts[0].display_name | Translation (biology) |
| concepts[1].id | https://openalex.org/C41008148 |
| concepts[1].level | 0 |
| concepts[1].score | 0.7011228799819946 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[1].display_name | Computer science |
| concepts[2].id | https://openalex.org/C2780366754 |
| concepts[2].level | 3 |
| concepts[2].score | 0.6168929934501648 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q7494857 |
| concepts[2].display_name | Speech translation |
| concepts[3].id | https://openalex.org/C204321447 |
| concepts[3].level | 1 |
| concepts[3].score | 0.49899935722351074 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q30642 |
| concepts[3].display_name | Natural language processing |
| concepts[4].id | https://openalex.org/C28490314 |
| concepts[4].level | 1 |
| concepts[4].score | 0.38027000427246094 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q189436 |
| concepts[4].display_name | Speech recognition |
| concepts[5].id | https://openalex.org/C154945302 |
| concepts[5].level | 1 |
| concepts[5].score | 0.35797402262687683 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[5].display_name | Artificial intelligence |
| concepts[6].id | https://openalex.org/C41895202 |
| concepts[6].level | 1 |
| concepts[6].score | 0.3258877396583557 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q8162 |
| concepts[6].display_name | Linguistics |
| concepts[7].id | https://openalex.org/C203005215 |
| concepts[7].level | 2 |
| concepts[7].score | 0.30750566720962524 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q79798 |
| concepts[7].display_name | Machine translation |
| concepts[8].id | https://openalex.org/C138885662 |
| concepts[8].level | 0 |
| concepts[8].score | 0.0 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q5891 |
| concepts[8].display_name | Philosophy |
| concepts[9].id | https://openalex.org/C185592680 |
| concepts[9].level | 0 |
| concepts[9].score | 0.0 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q2329 |
| concepts[9].display_name | Chemistry |
| concepts[10].id | https://openalex.org/C105580179 |
| concepts[10].level | 3 |
| concepts[10].score | 0.0 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q188928 |
| concepts[10].display_name | Messenger RNA |
| concepts[11].id | https://openalex.org/C104317684 |
| concepts[11].level | 2 |
| concepts[11].score | 0.0 |
| concepts[11].wikidata | https://www.wikidata.org/wiki/Q7187 |
| concepts[11].display_name | Gene |
| concepts[12].id | https://openalex.org/C55493867 |
| concepts[12].level | 1 |
| concepts[12].score | 0.0 |
| concepts[12].wikidata | https://www.wikidata.org/wiki/Q7094 |
| concepts[12].display_name | Biochemistry |
| keywords[0].id | https://openalex.org/keywords/translation |
| keywords[0].score | 0.7173244953155518 |
| keywords[0].display_name | Translation (biology) |
| keywords[1].id | https://openalex.org/keywords/computer-science |
| keywords[1].score | 0.7011228799819946 |
| keywords[1].display_name | Computer science |
| keywords[2].id | https://openalex.org/keywords/speech-translation |
| keywords[2].score | 0.6168929934501648 |
| keywords[2].display_name | Speech translation |
| keywords[3].id | https://openalex.org/keywords/natural-language-processing |
| keywords[3].score | 0.49899935722351074 |
| keywords[3].display_name | Natural language processing |
| keywords[4].id | https://openalex.org/keywords/speech-recognition |
| keywords[4].score | 0.38027000427246094 |
| keywords[4].display_name | Speech recognition |
| keywords[5].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[5].score | 0.35797402262687683 |
| keywords[5].display_name | Artificial intelligence |
| keywords[6].id | https://openalex.org/keywords/linguistics |
| keywords[6].score | 0.3258877396583557 |
| keywords[6].display_name | Linguistics |
| keywords[7].id | https://openalex.org/keywords/machine-translation |
| keywords[7].score | 0.30750566720962524 |
| keywords[7].display_name | Machine translation |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2501.15090 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2501.15090 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2501.15090 |
| locations[1].id | doi:10.48550/arxiv.2501.15090 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2501.15090 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5009648801 |
| authorships[0].author.orcid | https://orcid.org/0000-0002-2990-5589 |
| authorships[0].author.display_name | Henri Dou |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Dou, Huaixia |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5029944382 |
| authorships[1].author.orcid | https://orcid.org/0000-0003-1247-6076 |
| authorships[1].author.display_name | Xinyu Tian |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Tian, Xinyu |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5077286641 |
| authorships[2].author.orcid | https://orcid.org/0000-0003-1971-6618 |
| authorships[2].author.display_name | Xinglin Lyu |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Lyu, Xinglin |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5031936679 |
| authorships[3].author.orcid | https://orcid.org/0000-0001-6862-9022 |
| authorships[3].author.display_name | Jie Zhu |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Zhu, Jie |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5100369260 |
| authorships[4].author.orcid | https://orcid.org/0000-0001-7829-6348 |
| authorships[4].author.display_name | Junhui Li |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Li, Junhui |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A5056415939 |
| authorships[5].author.orcid | |
| authorships[5].author.display_name | Lifan Guo |
| authorships[5].author_position | last |
| authorships[5].raw_author_name | Guo, Lifan |
| authorships[5].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2501.15090 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Speech Translation Refinement using Large Language Models |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10181 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9466000199317932 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1702 |
| primary_topic.subfield.display_name | Artificial Intelligence |
| primary_topic.display_name | Natural Language Processing Techniques |
| related_works | https://openalex.org/W2775554247, https://openalex.org/W2883671469, https://openalex.org/W2728761353, https://openalex.org/W2110168585, https://openalex.org/W3107474891, https://openalex.org/W2250213760, https://openalex.org/W4386247111, https://openalex.org/W4327642362, https://openalex.org/W2587014613, https://openalex.org/W123774389 |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2501.15090 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2501.15090 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2501.15090 |
| primary_location.id | pmh:oai:arXiv.org:2501.15090 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2501.15090 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2501.15090 |
| publication_date | 2025-01-25 |
| publication_year | 2025 |
| referenced_works_count | 0 |
| abstract_inverted_index.2 | 98 |
| abstract_inverted_index.a | 38 |
| abstract_inverted_index.ST | 62 |
| abstract_inverted_index.We | 148 |
| abstract_inverted_index.by | 17, 36 |
| abstract_inverted_index.in | 2, 67 |
| abstract_inverted_index.is | 64 |
| abstract_inverted_index.of | 20, 33, 46, 60, 81, 108 |
| abstract_inverted_index.on | 84, 93, 154 |
| abstract_inverted_index.to | 136 |
| abstract_inverted_index.we | 77 |
| abstract_inverted_index.and | 50, 72, 96, 119, 130, 152 |
| abstract_inverted_index.can | 29 |
| abstract_inverted_index.how | 27 |
| abstract_inverted_index.our | 150 |
| abstract_inverted_index.the | 18, 31, 43, 58, 61, 79, 87, 94, 106, 109 |
| abstract_inverted_index.via | 56 |
| abstract_inverted_index.(ST) | 49 |
| abstract_inverted_index.LLMs | 28, 115 |
| abstract_inverted_index.both | 68, 128 |
| abstract_inverted_index.code | 151 |
| abstract_inverted_index.have | 7 |
| abstract_inverted_index.that | 125 |
| abstract_inverted_index.this | 24 |
| abstract_inverted_index.(ASR) | 54 |
| abstract_inverted_index.LLMs, | 57 |
| abstract_inverted_index.joint | 39, 44 |
| abstract_inverted_index.large | 3 |
| abstract_inverted_index.model | 63 |
| abstract_inverted_index.paper | 25 |
| abstract_inverted_index.seven | 102 |
| abstract_inverted_index.their | 9 |
| abstract_inverted_index.under | 86 |
| abstract_inverted_index.using | 112 |
| abstract_inverted_index.which | 100 |
| abstract_inverted_index.(LLMs) | 6 |
| abstract_inverted_index.CoVoST | 97 |
| abstract_inverted_index.MuST-C | 95 |
| abstract_inverted_index.Recent | 0 |
| abstract_inverted_index.across | 12 |
| abstract_inverted_index.alone. | 139 |
| abstract_inverted_index.better | 133 |
| abstract_inverted_index.effect | 80 |
| abstract_inverted_index.models | 5 |
| abstract_inverted_index.speech | 34, 47, 52 |
| abstract_inverted_index.tasks, | 104 |
| abstract_inverted_index.tasks. | 15 |
| abstract_inverted_index.yields | 132 |
| abstract_inverted_index.Further | 121 |
| abstract_inverted_index.GitHub. | 155 |
| abstract_inverted_index.Through | 42 |
| abstract_inverted_index.context | 83, 143 |
| abstract_inverted_index.explore | 78 |
| abstract_inverted_index.further | 123 |
| abstract_inverted_index.improve | 30 |
| abstract_inverted_index.include | 101 |
| abstract_inverted_index.jointly | 126 |
| abstract_inverted_index.popular | 114 |
| abstract_inverted_index.release | 149 |
| abstract_inverted_index.results | 92 |
| abstract_inverted_index.several | 113 |
| abstract_inverted_index.success | 19 |
| abstract_inverted_index.various | 13 |
| abstract_inverted_index.Inspired | 16 |
| abstract_inverted_index.analysis | 122 |
| abstract_inverted_index.approach | 111 |
| abstract_inverted_index.compared | 135 |
| abstract_inverted_index.datasets | 153 |
| abstract_inverted_index.enhances | 145 |
| abstract_inverted_index.improved | 66 |
| abstract_inverted_index.language | 4, 14 |
| abstract_inverted_index.learning | 71 |
| abstract_inverted_index.process. | 41 |
| abstract_inverted_index.proposed | 110 |
| abstract_inverted_index.refining | 127, 137 |
| abstract_inverted_index.suggests | 124 |
| abstract_inverted_index.automatic | 51 |
| abstract_inverted_index.datasets, | 99 |
| abstract_inverted_index.including | 116 |
| abstract_inverted_index.scenario. | 90 |
| abstract_inverted_index.LLaMA3-8B, | 118 |
| abstract_inverted_index.Meanwhile, | 140 |
| abstract_inverted_index.in-context | 70 |
| abstract_inverted_index.refinement | 40, 45, 85, 146 |
| abstract_inverted_index.remarkable | 10 |
| abstract_inverted_index.scenarios. | 75 |
| abstract_inverted_index.demonstrate | 105 |
| abstract_inverted_index.fine-tuning | 74, 89 |
| abstract_inverted_index.introducing | 37 |
| abstract_inverted_index.performance | 32, 59, 134 |
| abstract_inverted_index.recognition | 53 |
| abstract_inverted_index.refinement, | 23 |
| abstract_inverted_index.translation | 22, 35, 48, 103, 131, 138 |
| abstract_inverted_index.Experimental | 91 |
| abstract_inverted_index.Mistral-12B. | 120 |
| abstract_inverted_index.advancements | 1 |
| abstract_inverted_index.capabilities | 11 |
| abstract_inverted_index.demonstrated | 8 |
| abstract_inverted_index.investigates | 26 |
| abstract_inverted_index.performance. | 147 |
| abstract_inverted_index.text-to-text | 21 |
| abstract_inverted_index.Additionally, | 76 |
| abstract_inverted_index.context-aware | 88 |
| abstract_inverted_index.effectiveness | 107 |
| abstract_inverted_index.incorporating | 141 |
| abstract_inverted_index.significantly | 65, 144 |
| abstract_inverted_index.training-free | 69 |
| abstract_inverted_index.transcription | 55, 129 |
| abstract_inverted_index.GPT-3.5-turbo, | 117 |
| abstract_inverted_index.document-level | 82, 142 |
| abstract_inverted_index.parameter-efficient | 73 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 6 |
| citation_normalized_percentile |