Masked Self-distilled Transducer-based Keyword Spotting with Semi-autoregressive Decoding Article Swipe
YOU?
·
· 2025
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2505.24820
RNN-T-based keyword spotting (KWS) with autoregressive decoding~(AR) has gained attention due to its streaming architecture and superior performance. However, the simplicity of the prediction network in RNN-T poses an overfitting issue, especially under challenging scenarios, resulting in degraded performance. In this paper, we propose a masked self-distillation (MSD) training strategy that avoids RNN-Ts overly relying on prediction networks to alleviate overfitting. Such training enables masked non-autoregressive (NAR) decoding, which fully masks the RNN-T predictor output during KWS decoding. In addition, we propose a semi-autoregressive (SAR) decoding approach to integrate the advantages of AR and NAR decoding. Our experiments across multiple KWS datasets demonstrate that MSD training effectively alleviates overfitting. The SAR decoding method preserves the superior performance of AR decoding while benefits from the overfitting suppression of NAR decoding, achieving excellent results.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2505.24820
- https://arxiv.org/pdf/2505.24820
- OA Status
- green
- OpenAlex ID
- https://openalex.org/W4414858722
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4414858722Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2505.24820Digital Object Identifier
- Title
-
Masked Self-distilled Transducer-based Keyword Spotting with Semi-autoregressive DecodingWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2025Year of publication
- Publication date
-
2025-05-30Full publication date if available
- Authors
-
Yu Xi, Xiaoyu Gu, Haoyu Li, Jun Song, B. Zheng, Kai YuList of authors in order
- Landing page
-
https://arxiv.org/abs/2505.24820Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2505.24820Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2505.24820Direct OA link when available
- Cited by
-
0Total citation count in OpenAlex
Full payload
| id | https://openalex.org/W4414858722 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2505.24820 |
| ids.doi | https://doi.org/10.48550/arxiv.2505.24820 |
| ids.openalex | https://openalex.org/W4414858722 |
| fwci | 0.0 |
| type | preprint |
| title | Masked Self-distilled Transducer-based Keyword Spotting with Semi-autoregressive Decoding |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T13083 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9764000177383423 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1702 |
| topics[0].subfield.display_name | Artificial Intelligence |
| topics[0].display_name | Advanced Text Analysis Techniques |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2505.24820 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2505.24820 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2505.24820 |
| locations[1].id | doi:10.48550/arxiv.2505.24820 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | cc-by |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | https://openalex.org/licenses/cc-by |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2505.24820 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5102383536 |
| authorships[0].author.orcid | |
| authorships[0].author.display_name | Yu Xi |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Xi, Yu |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5015718206 |
| authorships[1].author.orcid | https://orcid.org/0000-0003-2929-2441 |
| authorships[1].author.display_name | Xiaoyu Gu |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Gu, Xiaoyu |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5091165662 |
| authorships[2].author.orcid | https://orcid.org/0000-0001-8015-5165 |
| authorships[2].author.display_name | Haoyu Li |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Li, Haoyu |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5101794446 |
| authorships[3].author.orcid | https://orcid.org/0009-0000-5875-484X |
| authorships[3].author.display_name | Jun Song |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Song, Jun |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5050479679 |
| authorships[4].author.orcid | https://orcid.org/0000-0002-6544-429X |
| authorships[4].author.display_name | B. Zheng |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Zheng, Bo |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A5001247903 |
| authorships[5].author.orcid | https://orcid.org/0000-0002-5453-8725 |
| authorships[5].author.display_name | Kai Yu |
| authorships[5].author_position | last |
| authorships[5].raw_author_name | Yu, Kai |
| authorships[5].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2505.24820 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Masked Self-distilled Transducer-based Keyword Spotting with Semi-autoregressive Decoding |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T13083 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9764000177383423 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1702 |
| primary_topic.subfield.display_name | Artificial Intelligence |
| primary_topic.display_name | Advanced Text Analysis Techniques |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2505.24820 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2505.24820 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2505.24820 |
| primary_location.id | pmh:oai:arXiv.org:2505.24820 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2505.24820 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2505.24820 |
| publication_date | 2025-05-30 |
| publication_year | 2025 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 44, 82 |
| abstract_inverted_index.AR | 92, 118 |
| abstract_inverted_index.In | 39, 78 |
| abstract_inverted_index.an | 28 |
| abstract_inverted_index.in | 25, 36 |
| abstract_inverted_index.of | 21, 91, 117, 126 |
| abstract_inverted_index.on | 55 |
| abstract_inverted_index.to | 11, 58, 87 |
| abstract_inverted_index.we | 42, 80 |
| abstract_inverted_index.KWS | 76, 100 |
| abstract_inverted_index.MSD | 104 |
| abstract_inverted_index.NAR | 94, 127 |
| abstract_inverted_index.Our | 96 |
| abstract_inverted_index.SAR | 110 |
| abstract_inverted_index.The | 109 |
| abstract_inverted_index.and | 15, 93 |
| abstract_inverted_index.due | 10 |
| abstract_inverted_index.has | 7 |
| abstract_inverted_index.its | 12 |
| abstract_inverted_index.the | 19, 22, 71, 89, 114, 123 |
| abstract_inverted_index.Such | 61 |
| abstract_inverted_index.from | 122 |
| abstract_inverted_index.that | 50, 103 |
| abstract_inverted_index.this | 40 |
| abstract_inverted_index.with | 4 |
| abstract_inverted_index.(KWS) | 3 |
| abstract_inverted_index.(MSD) | 47 |
| abstract_inverted_index.(NAR) | 66 |
| abstract_inverted_index.(SAR) | 84 |
| abstract_inverted_index.RNN-T | 26, 72 |
| abstract_inverted_index.fully | 69 |
| abstract_inverted_index.masks | 70 |
| abstract_inverted_index.poses | 27 |
| abstract_inverted_index.under | 32 |
| abstract_inverted_index.which | 68 |
| abstract_inverted_index.while | 120 |
| abstract_inverted_index.RNN-Ts | 52 |
| abstract_inverted_index.across | 98 |
| abstract_inverted_index.avoids | 51 |
| abstract_inverted_index.during | 75 |
| abstract_inverted_index.gained | 8 |
| abstract_inverted_index.issue, | 30 |
| abstract_inverted_index.masked | 45, 64 |
| abstract_inverted_index.method | 112 |
| abstract_inverted_index.output | 74 |
| abstract_inverted_index.overly | 53 |
| abstract_inverted_index.paper, | 41 |
| abstract_inverted_index.enables | 63 |
| abstract_inverted_index.keyword | 1 |
| abstract_inverted_index.network | 24 |
| abstract_inverted_index.propose | 43, 81 |
| abstract_inverted_index.relying | 54 |
| abstract_inverted_index.However, | 18 |
| abstract_inverted_index.approach | 86 |
| abstract_inverted_index.benefits | 121 |
| abstract_inverted_index.datasets | 101 |
| abstract_inverted_index.decoding | 85, 111, 119 |
| abstract_inverted_index.degraded | 37 |
| abstract_inverted_index.multiple | 99 |
| abstract_inverted_index.networks | 57 |
| abstract_inverted_index.results. | 131 |
| abstract_inverted_index.spotting | 2 |
| abstract_inverted_index.strategy | 49 |
| abstract_inverted_index.superior | 16, 115 |
| abstract_inverted_index.training | 48, 62, 105 |
| abstract_inverted_index.achieving | 129 |
| abstract_inverted_index.addition, | 79 |
| abstract_inverted_index.alleviate | 59 |
| abstract_inverted_index.attention | 9 |
| abstract_inverted_index.decoding, | 67, 128 |
| abstract_inverted_index.decoding. | 77, 95 |
| abstract_inverted_index.excellent | 130 |
| abstract_inverted_index.integrate | 88 |
| abstract_inverted_index.predictor | 73 |
| abstract_inverted_index.preserves | 113 |
| abstract_inverted_index.resulting | 35 |
| abstract_inverted_index.streaming | 13 |
| abstract_inverted_index.advantages | 90 |
| abstract_inverted_index.alleviates | 107 |
| abstract_inverted_index.especially | 31 |
| abstract_inverted_index.prediction | 23, 56 |
| abstract_inverted_index.scenarios, | 34 |
| abstract_inverted_index.simplicity | 20 |
| abstract_inverted_index.RNN-T-based | 0 |
| abstract_inverted_index.challenging | 33 |
| abstract_inverted_index.demonstrate | 102 |
| abstract_inverted_index.effectively | 106 |
| abstract_inverted_index.experiments | 97 |
| abstract_inverted_index.overfitting | 29, 124 |
| abstract_inverted_index.performance | 116 |
| abstract_inverted_index.suppression | 125 |
| abstract_inverted_index.architecture | 14 |
| abstract_inverted_index.overfitting. | 60, 108 |
| abstract_inverted_index.performance. | 17, 38 |
| abstract_inverted_index.decoding~(AR) | 6 |
| abstract_inverted_index.autoregressive | 5 |
| abstract_inverted_index.self-distillation | 46 |
| abstract_inverted_index.non-autoregressive | 65 |
| abstract_inverted_index.semi-autoregressive | 83 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 6 |
| citation_normalized_percentile |