Speech Translation Refinement using Large Language Models Article Swipe

PDF

Henri Dou , Xinyu Tian , Xinglin Lyu , Jie Zhu , Junhui Li , Lifan Guo ·

YOU? · · 2025 · Open Access · · DOI: https://doi.org/10.48550/arxiv.2501.15090

Recent advancements in large language models (LLMs) have demonstrated their remarkable capabilities across various language tasks. Inspired by the success of text-to-text translation refinement, this paper investigates how LLMs can improve the performance of speech translation by introducing a joint refinement process. Through the joint refinement of speech translation (ST) and automatic speech recognition (ASR) transcription via LLMs, the performance of the ST model is significantly improved in both training-free in-context learning and parameter-efficient fine-tuning scenarios. Additionally, we explore the effect of document-level context on refinement under the context-aware fine-tuning scenario. Experimental results on the MuST-C and CoVoST 2 datasets, which include seven translation tasks, demonstrate the effectiveness of the proposed approach using several popular LLMs including GPT-3.5-turbo, LLaMA3-8B, and Mistral-12B. Further analysis further suggests that jointly refining both transcription and translation yields better performance compared to refining translation alone. Meanwhile, incorporating document-level context significantly enhances refinement performance. We release our code and datasets on GitHub.

Related Topics

Computer Science

Artificial Intelligence

Concepts

Translation (biology) Computer science Speech translation Natural language processing Speech recognition Artificial intelligence Linguistics Machine translation Philosophy Chemistry Messenger RNA Gene Biochemistry

Metadata

Type: preprint
Language: en
Landing Page: http://arxiv.org/abs/2501.15090
PDF: https://arxiv.org/pdf/2501.15090
OA Status: green
Related Works: 10
OpenAlex ID: https://openalex.org/W4406880067

All OpenAlex metadata

Raw OpenAlex JSON

OpenAlex ID: https://openalex.org/W4406880067

Canonical identifier for this work in OpenAlex
DOI: https://doi.org/10.48550/arxiv.2501.15090

Digital Object Identifier
Title: Speech Translation Refinement using Large Language Models

Work title
Type: preprint

OpenAlex work type
Language: en

Primary language
Publication year: 2025

Year of publication
Publication date: 2025-01-25

Full publication date if available
Authors: Henri Dou, Xinyu Tian, Xinglin Lyu, Jie Zhu, Junhui Li, Lifan Guo

List of authors in order
Landing page: https://arxiv.org/abs/2501.15090

Publisher landing page
PDF URL: https://arxiv.org/pdf/2501.15090

Direct link to full text PDF
Open access: Yes

Whether a free full text is available
OA status: green

Open access status per OpenAlex
OA URL: https://arxiv.org/pdf/2501.15090

Direct OA link when available
Concepts: Translation (biology), Computer science, Speech translation, Natural language processing, Speech recognition, Artificial intelligence, Linguistics, Machine translation, Philosophy, Chemistry, Messenger RNA, Gene, Biochemistry

Top concepts (fields/topics) attached by OpenAlex
Cited by: 0

Total citation count in OpenAlex
Related works (count): 10

Other works algorithmically related by OpenAlex

Full payload

id	https://openalex.org/W4406880067
doi	https://doi.org/10.48550/arxiv.2501.15090
ids.doi	https://doi.org/10.48550/arxiv.2501.15090
ids.openalex	https://openalex.org/W4406880067
fwci
type	preprint
title	Speech Translation Refinement using Large Language Models
biblio.issue
biblio.volume
biblio.last_page
biblio.first_page
topics[0].id	https://openalex.org/T10181
topics[0].field.id	https://openalex.org/fields/17
topics[0].field.display_name	Computer Science
topics[0].score	0.9466000199317932
topics[0].domain.id	https://openalex.org/domains/3
topics[0].domain.display_name	Physical Sciences
topics[0].subfield.id	https://openalex.org/subfields/1702
topics[0].subfield.display_name	Artificial Intelligence
topics[0].display_name	Natural Language Processing Techniques
topics[1].id	https://openalex.org/T10201
topics[1].field.id	https://openalex.org/fields/17
topics[1].field.display_name	Computer Science
topics[1].score	0.9394000172615051
topics[1].domain.id	https://openalex.org/domains/3
topics[1].domain.display_name	Physical Sciences
topics[1].subfield.id	https://openalex.org/subfields/1702
topics[1].subfield.display_name	Artificial Intelligence
topics[1].display_name	Speech Recognition and Synthesis
is_xpac	False
apc_list
apc_paid
concepts[0].id	https://openalex.org/C149364088
concepts[0].level	4
concepts[0].score	0.7173244953155518
concepts[0].wikidata	https://www.wikidata.org/wiki/Q185917
concepts[0].display_name	Translation (biology)
concepts[1].id	https://openalex.org/C41008148
concepts[1].level	0
concepts[1].score	0.7011228799819946
concepts[1].wikidata	https://www.wikidata.org/wiki/Q21198
concepts[1].display_name	Computer science
concepts[2].id	https://openalex.org/C2780366754
concepts[2].level	3
concepts[2].score	0.6168929934501648
concepts[2].wikidata	https://www.wikidata.org/wiki/Q7494857
concepts[2].display_name	Speech translation
concepts[3].id	https://openalex.org/C204321447
concepts[3].level	1
concepts[3].score	0.49899935722351074
concepts[3].wikidata	https://www.wikidata.org/wiki/Q30642
concepts[3].display_name	Natural language processing
concepts[4].id	https://openalex.org/C28490314
concepts[4].level	1
concepts[4].score	0.38027000427246094
concepts[4].wikidata	https://www.wikidata.org/wiki/Q189436
concepts[4].display_name	Speech recognition
concepts[5].id	https://openalex.org/C154945302
concepts[5].level	1
concepts[5].score	0.35797402262687683
concepts[5].wikidata	https://www.wikidata.org/wiki/Q11660
concepts[5].display_name	Artificial intelligence
concepts[6].id	https://openalex.org/C41895202
concepts[6].level	1
concepts[6].score	0.3258877396583557
concepts[6].wikidata	https://www.wikidata.org/wiki/Q8162
concepts[6].display_name	Linguistics
concepts[7].id	https://openalex.org/C203005215
concepts[7].level	2
concepts[7].score	0.30750566720962524
concepts[7].wikidata	https://www.wikidata.org/wiki/Q79798
concepts[7].display_name	Machine translation
concepts[8].id	https://openalex.org/C138885662
concepts[8].level	0
concepts[8].score	0.0
concepts[8].wikidata	https://www.wikidata.org/wiki/Q5891
concepts[8].display_name	Philosophy
concepts[9].id	https://openalex.org/C185592680
concepts[9].level	0
concepts[9].score	0.0
concepts[9].wikidata	https://www.wikidata.org/wiki/Q2329
concepts[9].display_name	Chemistry
concepts[10].id	https://openalex.org/C105580179
concepts[10].level	3
concepts[10].score	0.0
concepts[10].wikidata	https://www.wikidata.org/wiki/Q188928
concepts[10].display_name	Messenger RNA
concepts[11].id	https://openalex.org/C104317684
concepts[11].level	2
concepts[11].score	0.0
concepts[11].wikidata	https://www.wikidata.org/wiki/Q7187
concepts[11].display_name	Gene
concepts[12].id	https://openalex.org/C55493867
concepts[12].level	1
concepts[12].score	0.0
concepts[12].wikidata	https://www.wikidata.org/wiki/Q7094
concepts[12].display_name	Biochemistry
keywords[0].id	https://openalex.org/keywords/translation
keywords[0].score	0.7173244953155518
keywords[0].display_name	Translation (biology)
keywords[1].id	https://openalex.org/keywords/computer-science
keywords[1].score	0.7011228799819946
keywords[1].display_name	Computer science
keywords[2].id	https://openalex.org/keywords/speech-translation
keywords[2].score	0.6168929934501648
keywords[2].display_name	Speech translation
keywords[3].id	https://openalex.org/keywords/natural-language-processing
keywords[3].score	0.49899935722351074
keywords[3].display_name	Natural language processing
keywords[4].id	https://openalex.org/keywords/speech-recognition
keywords[4].score	0.38027000427246094
keywords[4].display_name	Speech recognition
keywords[5].id	https://openalex.org/keywords/artificial-intelligence
keywords[5].score	0.35797402262687683
keywords[5].display_name	Artificial intelligence
keywords[6].id	https://openalex.org/keywords/linguistics
keywords[6].score	0.3258877396583557
keywords[6].display_name	Linguistics
keywords[7].id	https://openalex.org/keywords/machine-translation
keywords[7].score	0.30750566720962524
keywords[7].display_name	Machine translation
language	en
locations[0].id	pmh:oai:arXiv.org:2501.15090
locations[0].is_oa	True
locations[0].source.id	https://openalex.org/S4306400194
locations[0].source.issn
locations[0].source.type	repository
locations[0].source.is_oa	True
locations[0].source.issn_l
locations[0].source.is_core	False
locations[0].source.is_in_doaj	False
locations[0].source.display_name	arXiv (Cornell University)
locations[0].source.host_organization	https://openalex.org/I205783295
locations[0].source.host_organization_name	Cornell University
locations[0].source.host_organization_lineage	https://openalex.org/I205783295
locations[0].license
locations[0].pdf_url	https://arxiv.org/pdf/2501.15090
locations[0].version	submittedVersion
locations[0].raw_type	text
locations[0].license_id
locations[0].is_accepted	False
locations[0].is_published	False
locations[0].raw_source_name
locations[0].landing_page_url	http://arxiv.org/abs/2501.15090
locations[1].id	doi:10.48550/arxiv.2501.15090
locations[1].is_oa	True
locations[1].source.id	https://openalex.org/S4306400194
locations[1].source.issn
locations[1].source.type	repository
locations[1].source.is_oa	True
locations[1].source.issn_l
locations[1].source.is_core	False
locations[1].source.is_in_doaj	False
locations[1].source.display_name	arXiv (Cornell University)
locations[1].source.host_organization	https://openalex.org/I205783295
locations[1].source.host_organization_name	Cornell University
locations[1].source.host_organization_lineage	https://openalex.org/I205783295
locations[1].license
locations[1].pdf_url
locations[1].version
locations[1].raw_type	article
locations[1].license_id
locations[1].is_accepted	False
locations[1].is_published
locations[1].raw_source_name
locations[1].landing_page_url	https://doi.org/10.48550/arxiv.2501.15090
indexed_in	arxiv, datacite
authorships[0].author.id	https://openalex.org/A5009648801
authorships[0].author.orcid	https://orcid.org/0000-0002-2990-5589
authorships[0].author.display_name	Henri Dou
authorships[0].author_position	first
authorships[0].raw_author_name	Dou, Huaixia
authorships[0].is_corresponding	False
authorships[1].author.id	https://openalex.org/A5029944382
authorships[1].author.orcid	https://orcid.org/0000-0003-1247-6076
authorships[1].author.display_name	Xinyu Tian
authorships[1].author_position	middle
authorships[1].raw_author_name	Tian, Xinyu
authorships[1].is_corresponding	False
authorships[2].author.id	https://openalex.org/A5077286641
authorships[2].author.orcid	https://orcid.org/0000-0003-1971-6618
authorships[2].author.display_name	Xinglin Lyu
authorships[2].author_position	middle
authorships[2].raw_author_name	Lyu, Xinglin
authorships[2].is_corresponding	False
authorships[3].author.id	https://openalex.org/A5031936679
authorships[3].author.orcid	https://orcid.org/0000-0001-6862-9022
authorships[3].author.display_name	Jie Zhu
authorships[3].author_position	middle
authorships[3].raw_author_name	Zhu, Jie
authorships[3].is_corresponding	False
authorships[4].author.id	https://openalex.org/A5100369260
authorships[4].author.orcid	https://orcid.org/0000-0001-7829-6348
authorships[4].author.display_name	Junhui Li
authorships[4].author_position	middle
authorships[4].raw_author_name	Li, Junhui
authorships[4].is_corresponding	False
authorships[5].author.id	https://openalex.org/A5056415939
authorships[5].author.orcid
authorships[5].author.display_name	Lifan Guo
authorships[5].author_position	last
authorships[5].raw_author_name	Guo, Lifan
authorships[5].is_corresponding	False
has_content.pdf	False
has_content.grobid_xml	False
is_paratext	False
open_access.is_oa	True
open_access.oa_url	https://arxiv.org/pdf/2501.15090
open_access.oa_status	green
open_access.any_repository_has_fulltext	False
created_date	2025-10-10T00:00:00
display_name	Speech Translation Refinement using Large Language Models
has_fulltext	False
is_retracted	False
updated_date	2025-11-06T06:51:31.235846
primary_topic.id	https://openalex.org/T10181
primary_topic.field.id	https://openalex.org/fields/17
primary_topic.field.display_name	Computer Science
primary_topic.score	0.9466000199317932
primary_topic.domain.id	https://openalex.org/domains/3
primary_topic.domain.display_name	Physical Sciences
primary_topic.subfield.id	https://openalex.org/subfields/1702
primary_topic.subfield.display_name	Artificial Intelligence
primary_topic.display_name	Natural Language Processing Techniques
related_works	https://openalex.org/W2775554247, https://openalex.org/W2883671469, https://openalex.org/W2728761353, https://openalex.org/W2110168585, https://openalex.org/W3107474891, https://openalex.org/W2250213760, https://openalex.org/W4386247111, https://openalex.org/W4327642362, https://openalex.org/W2587014613, https://openalex.org/W123774389
cited_by_count	0
locations_count	2
best_oa_location.id	pmh:oai:arXiv.org:2501.15090
best_oa_location.is_oa	True
best_oa_location.source.id	https://openalex.org/S4306400194
best_oa_location.source.issn
best_oa_location.source.type	repository
best_oa_location.source.is_oa	True
best_oa_location.source.issn_l
best_oa_location.source.is_core	False
best_oa_location.source.is_in_doaj	False
best_oa_location.source.display_name	arXiv (Cornell University)
best_oa_location.source.host_organization	https://openalex.org/I205783295
best_oa_location.source.host_organization_name	Cornell University
best_oa_location.source.host_organization_lineage	https://openalex.org/I205783295
best_oa_location.license
best_oa_location.pdf_url	https://arxiv.org/pdf/2501.15090
best_oa_location.version	submittedVersion
best_oa_location.raw_type	text
best_oa_location.license_id
best_oa_location.is_accepted	False
best_oa_location.is_published	False
best_oa_location.raw_source_name
best_oa_location.landing_page_url	http://arxiv.org/abs/2501.15090
primary_location.id	pmh:oai:arXiv.org:2501.15090
primary_location.is_oa	True
primary_location.source.id	https://openalex.org/S4306400194
primary_location.source.issn
primary_location.source.type	repository
primary_location.source.is_oa	True
primary_location.source.issn_l
primary_location.source.is_core	False
primary_location.source.is_in_doaj	False
primary_location.source.display_name	arXiv (Cornell University)
primary_location.source.host_organization	https://openalex.org/I205783295
primary_location.source.host_organization_name	Cornell University
primary_location.source.host_organization_lineage	https://openalex.org/I205783295
primary_location.license
primary_location.pdf_url	https://arxiv.org/pdf/2501.15090
primary_location.version	submittedVersion
primary_location.raw_type	text
primary_location.license_id
primary_location.is_accepted	False
primary_location.is_published	False
primary_location.raw_source_name
primary_location.landing_page_url	http://arxiv.org/abs/2501.15090
publication_date	2025-01-25
publication_year	2025
referenced_works_count	0
abstract_inverted_index.2	98
abstract_inverted_index.a	38
abstract_inverted_index.ST	62
abstract_inverted_index.We	148
abstract_inverted_index.by	17, 36
abstract_inverted_index.in	2, 67
abstract_inverted_index.is	64
abstract_inverted_index.of	20, 33, 46, 60, 81, 108
abstract_inverted_index.on	84, 93, 154
abstract_inverted_index.to	136
abstract_inverted_index.we	77
abstract_inverted_index.and	50, 72, 96, 119, 130, 152
abstract_inverted_index.can	29
abstract_inverted_index.how	27
abstract_inverted_index.our	150
abstract_inverted_index.the	18, 31, 43, 58, 61, 79, 87, 94, 106, 109
abstract_inverted_index.via	56
abstract_inverted_index.(ST)	49
abstract_inverted_index.LLMs	28, 115
abstract_inverted_index.both	68, 128
abstract_inverted_index.code	151
abstract_inverted_index.have	7
abstract_inverted_index.that	125
abstract_inverted_index.this	24
abstract_inverted_index.(ASR)	54
abstract_inverted_index.LLMs,	57
abstract_inverted_index.joint	39, 44
abstract_inverted_index.large	3
abstract_inverted_index.model	63
abstract_inverted_index.paper	25
abstract_inverted_index.seven	102
abstract_inverted_index.their	9
abstract_inverted_index.under	86
abstract_inverted_index.using	112
abstract_inverted_index.which	100
abstract_inverted_index.(LLMs)	6
abstract_inverted_index.CoVoST	97
abstract_inverted_index.MuST-C	95
abstract_inverted_index.Recent	0
abstract_inverted_index.across	12
abstract_inverted_index.alone.	139
abstract_inverted_index.better	133
abstract_inverted_index.effect	80
abstract_inverted_index.models	5
abstract_inverted_index.speech	34, 47, 52
abstract_inverted_index.tasks,	104
abstract_inverted_index.tasks.	15
abstract_inverted_index.yields	132
abstract_inverted_index.Further	121
abstract_inverted_index.GitHub.	155
abstract_inverted_index.Through	42
abstract_inverted_index.context	83, 143
abstract_inverted_index.explore	78
abstract_inverted_index.further	123
abstract_inverted_index.improve	30
abstract_inverted_index.include	101
abstract_inverted_index.jointly	126
abstract_inverted_index.popular	114
abstract_inverted_index.release	149
abstract_inverted_index.results	92
abstract_inverted_index.several	113
abstract_inverted_index.success	19
abstract_inverted_index.various	13
abstract_inverted_index.Inspired	16
abstract_inverted_index.analysis	122
abstract_inverted_index.approach	111
abstract_inverted_index.compared	135
abstract_inverted_index.datasets	153
abstract_inverted_index.enhances	145
abstract_inverted_index.improved	66
abstract_inverted_index.language	4, 14
abstract_inverted_index.learning	71
abstract_inverted_index.process.	41
abstract_inverted_index.proposed	110
abstract_inverted_index.refining	127, 137
abstract_inverted_index.suggests	124
abstract_inverted_index.automatic	51
abstract_inverted_index.datasets,	99
abstract_inverted_index.including	116
abstract_inverted_index.scenario.	90
abstract_inverted_index.LLaMA3-8B,	118
abstract_inverted_index.Meanwhile,	140
abstract_inverted_index.in-context	70
abstract_inverted_index.refinement	40, 45, 85, 146
abstract_inverted_index.remarkable	10
abstract_inverted_index.scenarios.	75
abstract_inverted_index.demonstrate	105
abstract_inverted_index.fine-tuning	74, 89
abstract_inverted_index.introducing	37
abstract_inverted_index.performance	32, 59, 134
abstract_inverted_index.recognition	53
abstract_inverted_index.refinement,	23
abstract_inverted_index.translation	22, 35, 48, 103, 131, 138
abstract_inverted_index.Experimental	91
abstract_inverted_index.Mistral-12B.	120
abstract_inverted_index.advancements	1
abstract_inverted_index.capabilities	11
abstract_inverted_index.demonstrated	8
abstract_inverted_index.investigates	26
abstract_inverted_index.performance.	147
abstract_inverted_index.text-to-text	21
abstract_inverted_index.Additionally,	76
abstract_inverted_index.context-aware	88
abstract_inverted_index.effectiveness	107
abstract_inverted_index.incorporating	141
abstract_inverted_index.significantly	65, 144
abstract_inverted_index.training-free	69
abstract_inverted_index.transcription	55, 129
abstract_inverted_index.GPT-3.5-turbo,	117
abstract_inverted_index.document-level	82, 142
abstract_inverted_index.parameter-efficient	73
cited_by_percentile_year
countries_distinct_count	0
institutions_distinct_count	6
citation_normalized_percentile