SALM-Duplex: Efficient and Direct Duplex Modeling for Speech-to-Speech Language Model Article Swipe

PDF

Ke Hu , Ehsan Hosseini-Asl , Chen Chen , Edresson Casanova , Subhankar Ghosh , Piotr Żelasko , Zhehuai Chen , Jason Li , Jagadeesh Balam , Boris Ginsburg ·

YOU? · · 2025 · Open Access · · DOI: https://doi.org/10.48550/arxiv.2505.15670

Spoken dialogue is an intuitive form of human-computer interaction, yet current speech language models often remain constrained to turn-based exchanges, lacking real-time adaptability such as user barge-in. We propose a novel duplex speech to speech (S2S) architecture featuring continuous user inputs and codec agent outputs with channel fusion that directly models simultaneous user and agent streams. Using a pretrained streaming encoder for user input enables the first duplex S2S model without requiring speech pretrain. Separate architectures for agent and user modeling facilitate codec fine-tuning for better agent voices and halve the bitrate (0.6 kbps) compared to previous works. Experimental results show that the proposed model outperforms previous duplex models in reasoning, turn-taking, and barge-in abilities. The model requires significantly less speech data, as speech pretrain is skipped, which markedly simplifies the process of building a duplex S2S model from any LLMs. Finally, it is the first openly available duplex S2S model with training and inference code to foster reproducibility.

Related Topics

Truth And Reconciliation Commission Of Canada

Alanis Morissette

2025 Nba Draft

28 Years Later

Reich Ministry Of Public Enlightenment And Propaganda

Mahmood Mamdani

Rick Hurst

Concepts

No concepts available.

Metadata

Type: preprint
Language: en
Landing Page: http://arxiv.org/abs/2505.15670
PDF: https://arxiv.org/pdf/2505.15670
OA Status: green
OpenAlex ID: https://openalex.org/W4417262065

All OpenAlex metadata

Raw OpenAlex JSON

OpenAlex ID: https://openalex.org/W4417262065

Canonical identifier for this work in OpenAlex
DOI: https://doi.org/10.48550/arxiv.2505.15670

Digital Object Identifier
Title: SALM-Duplex: Efficient and Direct Duplex Modeling for Speech-to-Speech Language Model

Work title
Type: preprint

OpenAlex work type
Language: en

Primary language
Publication year: 2025

Year of publication
Publication date: 2025-05-21

Full publication date if available
Authors: Ke Hu, Ehsan Hosseini-Asl, Chen Chen, Edresson Casanova, Subhankar Ghosh, Piotr Żelasko, Zhehuai Chen, Jason Li, Jagadeesh Balam, Boris Ginsburg

List of authors in order
Landing page: https://arxiv.org/abs/2505.15670

Publisher landing page
PDF URL: https://arxiv.org/pdf/2505.15670

Direct link to full text PDF
Open access: Yes

Whether a free full text is available
OA status: green

Open access status per OpenAlex
OA URL: https://arxiv.org/pdf/2505.15670

Direct OA link when available
Cited by: 0

Total citation count in OpenAlex

Full payload

id	https://openalex.org/W4417262065
doi	https://doi.org/10.48550/arxiv.2505.15670
ids.doi	https://doi.org/10.48550/arxiv.2505.15670
ids.openalex	https://openalex.org/W4417262065
fwci
type	preprint
title	SALM-Duplex: Efficient and Direct Duplex Modeling for Speech-to-Speech Language Model
biblio.issue
biblio.volume
biblio.last_page
biblio.first_page
is_xpac	False
apc_list
apc_paid
language	en
locations[0].id	pmh:oai:arXiv.org:2505.15670
locations[0].is_oa	True
locations[0].source.id	https://openalex.org/S4306400194
locations[0].source.issn
locations[0].source.type	repository
locations[0].source.is_oa	True
locations[0].source.issn_l
locations[0].source.is_core	False
locations[0].source.is_in_doaj	False
locations[0].source.display_name	arXiv (Cornell University)
locations[0].source.host_organization	https://openalex.org/I205783295
locations[0].source.host_organization_name	Cornell University
locations[0].source.host_organization_lineage	https://openalex.org/I205783295
locations[0].license
locations[0].pdf_url	https://arxiv.org/pdf/2505.15670
locations[0].version	submittedVersion
locations[0].raw_type	text
locations[0].license_id
locations[0].is_accepted	False
locations[0].is_published	False
locations[0].raw_source_name
locations[0].landing_page_url	http://arxiv.org/abs/2505.15670
locations[1].id	doi:10.48550/arxiv.2505.15670
locations[1].is_oa	True
locations[1].source.id	https://openalex.org/S4306400194
locations[1].source.issn
locations[1].source.type	repository
locations[1].source.is_oa	True
locations[1].source.issn_l
locations[1].source.is_core	False
locations[1].source.is_in_doaj	False
locations[1].source.display_name	arXiv (Cornell University)
locations[1].source.host_organization	https://openalex.org/I205783295
locations[1].source.host_organization_name	Cornell University
locations[1].source.host_organization_lineage	https://openalex.org/I205783295
locations[1].license
locations[1].pdf_url
locations[1].version
locations[1].raw_type	article
locations[1].license_id
locations[1].is_accepted	False
locations[1].is_published
locations[1].raw_source_name
locations[1].landing_page_url	https://doi.org/10.48550/arxiv.2505.15670
indexed_in	arxiv, datacite
authorships[0].author.id	https://openalex.org/A5029338576
authorships[0].author.orcid	https://orcid.org/0000-0002-1599-1519
authorships[0].author.display_name	Ke Hu
authorships[0].author_position	first
authorships[0].raw_author_name	Hu, Ke
authorships[0].is_corresponding	False
authorships[1].author.id	https://openalex.org/A5020404017
authorships[1].author.orcid
authorships[1].author.display_name	Ehsan Hosseini-Asl
authorships[1].author_position	middle
authorships[1].raw_author_name	Hosseini-Asl, Ehsan
authorships[1].is_corresponding	False
authorships[2].author.id	https://openalex.org/A5100418547
authorships[2].author.orcid	https://orcid.org/0000-0003-1957-6432
authorships[2].author.display_name	Chen Chen
authorships[2].author_position	middle
authorships[2].raw_author_name	Chen, Chen
authorships[2].is_corresponding	False
authorships[3].author.id	https://openalex.org/A5025318789
authorships[3].author.orcid	https://orcid.org/0000-0003-0160-7173
authorships[3].author.display_name	Edresson Casanova
authorships[3].author_position	middle
authorships[3].raw_author_name	Casanova, Edresson
authorships[3].is_corresponding	False
authorships[4].author.id	https://openalex.org/A5101935896
authorships[4].author.orcid	https://orcid.org/0000-0001-9191-5684
authorships[4].author.display_name	Subhankar Ghosh
authorships[4].author_position	middle
authorships[4].raw_author_name	Ghosh, Subhankar
authorships[4].is_corresponding	False
authorships[5].author.id	https://openalex.org/A5027217976
authorships[5].author.orcid	https://orcid.org/0000-0002-8245-0413
authorships[5].author.display_name	Piotr Żelasko
authorships[5].author_position	middle
authorships[5].raw_author_name	Żelasko, Piotr
authorships[5].is_corresponding	False
authorships[6].author.id	https://openalex.org/A5002433660
authorships[6].author.orcid	https://orcid.org/0000-0003-4400-5340
authorships[6].author.display_name	Zhehuai Chen
authorships[6].author_position	middle
authorships[6].raw_author_name	Chen, Zhehuai
authorships[6].is_corresponding	False
authorships[7].author.id	https://openalex.org/A5100762970
authorships[7].author.orcid	https://orcid.org/0000-0002-1150-3549
authorships[7].author.display_name	Jason Li
authorships[7].author_position	middle
authorships[7].raw_author_name	Li, Jason
authorships[7].is_corresponding	False
authorships[8].author.id	https://openalex.org/A5040747392
authorships[8].author.orcid
authorships[8].author.display_name	Jagadeesh Balam
authorships[8].author_position	middle
authorships[8].raw_author_name	Balam, Jagadeesh
authorships[8].is_corresponding	False
authorships[9].author.id	https://openalex.org/A5032957280
authorships[9].author.orcid
authorships[9].author.display_name	Boris Ginsburg
authorships[9].author_position	last
authorships[9].raw_author_name	Ginsburg, Boris
authorships[9].is_corresponding	False
has_content.pdf	True
has_content.grobid_xml	True
is_paratext	False
open_access.is_oa	True
open_access.oa_url	https://arxiv.org/pdf/2505.15670
open_access.oa_status	green
open_access.any_repository_has_fulltext	False
created_date	2025-10-10T00:00:00
display_name	SALM-Duplex: Efficient and Direct Duplex Modeling for Speech-to-Speech Language Model
has_fulltext	False
is_retracted	False
updated_date	2025-12-12T05:47:21.961719
primary_topic
cited_by_count	0
locations_count	2
best_oa_location.id	pmh:oai:arXiv.org:2505.15670
best_oa_location.is_oa	True
best_oa_location.source.id	https://openalex.org/S4306400194
best_oa_location.source.issn
best_oa_location.source.type	repository
best_oa_location.source.is_oa	True
best_oa_location.source.issn_l
best_oa_location.source.is_core	False
best_oa_location.source.is_in_doaj	False
best_oa_location.source.display_name	arXiv (Cornell University)
best_oa_location.source.host_organization	https://openalex.org/I205783295
best_oa_location.source.host_organization_name	Cornell University
best_oa_location.source.host_organization_lineage	https://openalex.org/I205783295
best_oa_location.license
best_oa_location.pdf_url	https://arxiv.org/pdf/2505.15670
best_oa_location.version	submittedVersion
best_oa_location.raw_type	text
best_oa_location.license_id
best_oa_location.is_accepted	False
best_oa_location.is_published	False
best_oa_location.raw_source_name
best_oa_location.landing_page_url	http://arxiv.org/abs/2505.15670
primary_location.id	pmh:oai:arXiv.org:2505.15670
primary_location.is_oa	True
primary_location.source.id	https://openalex.org/S4306400194
primary_location.source.issn
primary_location.source.type	repository
primary_location.source.is_oa	True
primary_location.source.issn_l
primary_location.source.is_core	False
primary_location.source.is_in_doaj	False
primary_location.source.display_name	arXiv (Cornell University)
primary_location.source.host_organization	https://openalex.org/I205783295
primary_location.source.host_organization_name	Cornell University
primary_location.source.host_organization_lineage	https://openalex.org/I205783295
primary_location.license
primary_location.pdf_url	https://arxiv.org/pdf/2505.15670
primary_location.version	submittedVersion
primary_location.raw_type	text
primary_location.license_id
primary_location.is_accepted	False
primary_location.is_published	False
primary_location.raw_source_name
primary_location.landing_page_url	http://arxiv.org/abs/2505.15670
publication_date	2025-05-21
publication_year	2025
referenced_works_count	0
abstract_inverted_index.a	29, 57, 134
abstract_inverted_index.We	27
abstract_inverted_index.an	3
abstract_inverted_index.as	24, 122
abstract_inverted_index.in	109
abstract_inverted_index.is	2, 125, 143
abstract_inverted_index.it	142
abstract_inverted_index.of	6, 132
abstract_inverted_index.to	17, 33, 95, 156
abstract_inverted_index.S2S	68, 136, 149
abstract_inverted_index.The	115
abstract_inverted_index.and	41, 53, 78, 88, 112, 153
abstract_inverted_index.any	139
abstract_inverted_index.for	61, 76, 84
abstract_inverted_index.the	65, 90, 102, 130, 144
abstract_inverted_index.yet	9
abstract_inverted_index.(0.6	92
abstract_inverted_index.code	155
abstract_inverted_index.form	5
abstract_inverted_index.from	138
abstract_inverted_index.less	119
abstract_inverted_index.show	100
abstract_inverted_index.such	23
abstract_inverted_index.that	48, 101
abstract_inverted_index.user	25, 39, 52, 62, 79
abstract_inverted_index.with	45, 151
abstract_inverted_index.(S2S)	35
abstract_inverted_index.LLMs.	140
abstract_inverted_index.Using	56
abstract_inverted_index.agent	43, 54, 77, 86
abstract_inverted_index.codec	42, 82
abstract_inverted_index.data,	121
abstract_inverted_index.first	66, 145
abstract_inverted_index.halve	89
abstract_inverted_index.input	63
abstract_inverted_index.kbps)	93
abstract_inverted_index.model	69, 104, 116, 137, 150
abstract_inverted_index.novel	30
abstract_inverted_index.often	14
abstract_inverted_index.which	127
abstract_inverted_index.Spoken	0
abstract_inverted_index.better	85
abstract_inverted_index.duplex	31, 67, 107, 135, 148
abstract_inverted_index.foster	157
abstract_inverted_index.fusion	47
abstract_inverted_index.inputs	40
abstract_inverted_index.models	13, 50, 108
abstract_inverted_index.openly	146
abstract_inverted_index.remain	15
abstract_inverted_index.speech	11, 32, 34, 72, 120, 123
abstract_inverted_index.voices	87
abstract_inverted_index.works.	97
abstract_inverted_index.bitrate	91
abstract_inverted_index.channel	46
abstract_inverted_index.current	10
abstract_inverted_index.enables	64
abstract_inverted_index.encoder	60
abstract_inverted_index.lacking	20
abstract_inverted_index.outputs	44
abstract_inverted_index.process	131
abstract_inverted_index.propose	28
abstract_inverted_index.results	99
abstract_inverted_index.without	70
abstract_inverted_index.Finally,	141
abstract_inverted_index.Separate	74
abstract_inverted_index.barge-in	113
abstract_inverted_index.building	133
abstract_inverted_index.compared	94
abstract_inverted_index.dialogue	1
abstract_inverted_index.directly	49
abstract_inverted_index.language	12
abstract_inverted_index.markedly	128
abstract_inverted_index.modeling	80
abstract_inverted_index.pretrain	124
abstract_inverted_index.previous	96, 106
abstract_inverted_index.proposed	103
abstract_inverted_index.requires	117
abstract_inverted_index.skipped,	126
abstract_inverted_index.streams.	55
abstract_inverted_index.training	152
abstract_inverted_index.available	147
abstract_inverted_index.barge-in.	26
abstract_inverted_index.featuring	37
abstract_inverted_index.inference	154
abstract_inverted_index.intuitive	4
abstract_inverted_index.pretrain.	73
abstract_inverted_index.real-time	21
abstract_inverted_index.requiring	71
abstract_inverted_index.streaming	59
abstract_inverted_index.abilities.	114
abstract_inverted_index.continuous	38
abstract_inverted_index.exchanges,	19
abstract_inverted_index.facilitate	81
abstract_inverted_index.pretrained	58
abstract_inverted_index.reasoning,	110
abstract_inverted_index.simplifies	129
abstract_inverted_index.turn-based	18
abstract_inverted_index.constrained	16
abstract_inverted_index.fine-tuning	83
abstract_inverted_index.outperforms	105
abstract_inverted_index.Experimental	98
abstract_inverted_index.adaptability	22
abstract_inverted_index.architecture	36
abstract_inverted_index.interaction,	8
abstract_inverted_index.simultaneous	51
abstract_inverted_index.turn-taking,	111
abstract_inverted_index.architectures	75
abstract_inverted_index.significantly	118
abstract_inverted_index.human-computer	7
abstract_inverted_index.reproducibility.	158
cited_by_percentile_year
countries_distinct_count	0
institutions_distinct_count	10
citation_normalized_percentile