E2E Segmenter: Joint Segmenting and Decoding for Long-Form ASR Article Swipe
YOU?
·
· 2022
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2204.10749
Improving the performance of end-to-end ASR models on long utterances ranging from minutes to hours in length is an ongoing challenge in speech recognition. A common solution is to segment the audio in advance using a separate voice activity detector (VAD) that decides segment boundary locations based purely on acoustic speech/non-speech information. VAD segmenters, however, may be sub-optimal for real-world speech where, e.g., a complete sentence that should be taken as a whole may contain hesitations in the middle ("set an alarm for... 5 o'clock"). We propose to replace the VAD with an end-to-end ASR model capable of predicting segment boundaries in a streaming fashion, allowing the segmentation decision to be conditioned not only on better acoustic features but also on semantic features from the decoded text with negligible extra computation. In experiments on real world long-form audio (YouTube) with lengths of up to 30 minutes, we demonstrate 8.5% relative WER improvement and 250 ms reduction in median end-of-segment latency compared to the VAD segmenter baseline on a state-of-the-art Conformer RNN-T model.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2204.10749
- https://arxiv.org/pdf/2204.10749
- OA Status
- green
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4360601747
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4360601747Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2204.10749Digital Object Identifier
- Title
-
E2E Segmenter: Joint Segmenting and Decoding for Long-Form ASRWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2022Year of publication
- Publication date
-
2022-04-22Full publication date if available
- Authors
-
W. Ronny Huang, Shuo-Yiin Chang, David Rybach, Rohit Prabhavalkar, Tara N. Sainath, Cyril Allauzen, Cal Peyser, Zhiyun LuList of authors in order
- Landing page
-
https://arxiv.org/abs/2204.10749Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2204.10749Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2204.10749Direct OA link when available
- Concepts
-
Computer science, Speech recognition, Segmentation, Sentence, Market segmentation, Latency (audio), Set (abstract data type), False alarm, Decoding methods, End-to-end principle, ALARM, Artificial intelligence, Algorithm, Telecommunications, Marketing, Materials science, Composite material, Business, Programming languageTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4360601747 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2204.10749 |
| ids.doi | https://doi.org/10.48550/arxiv.2204.10749 |
| ids.openalex | https://openalex.org/W4360601747 |
| fwci | |
| type | preprint |
| title | E2E Segmenter: Joint Segmenting and Decoding for Long-Form ASR |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10201 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9998000264167786 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1702 |
| topics[0].subfield.display_name | Artificial Intelligence |
| topics[0].display_name | Speech Recognition and Synthesis |
| topics[1].id | https://openalex.org/T10860 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9983000159263611 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1711 |
| topics[1].subfield.display_name | Signal Processing |
| topics[1].display_name | Speech and Audio Processing |
| topics[2].id | https://openalex.org/T11309 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.9976000189781189 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1711 |
| topics[2].subfield.display_name | Signal Processing |
| topics[2].display_name | Music and Audio Processing |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C41008148 |
| concepts[0].level | 0 |
| concepts[0].score | 0.7312777042388916 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[0].display_name | Computer science |
| concepts[1].id | https://openalex.org/C28490314 |
| concepts[1].level | 1 |
| concepts[1].score | 0.6845752596855164 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q189436 |
| concepts[1].display_name | Speech recognition |
| concepts[2].id | https://openalex.org/C89600930 |
| concepts[2].level | 2 |
| concepts[2].score | 0.6802265644073486 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q1423946 |
| concepts[2].display_name | Segmentation |
| concepts[3].id | https://openalex.org/C2777530160 |
| concepts[3].level | 2 |
| concepts[3].score | 0.5759625434875488 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q41796 |
| concepts[3].display_name | Sentence |
| concepts[4].id | https://openalex.org/C125308379 |
| concepts[4].level | 2 |
| concepts[4].score | 0.5214880704879761 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q363057 |
| concepts[4].display_name | Market segmentation |
| concepts[5].id | https://openalex.org/C82876162 |
| concepts[5].level | 2 |
| concepts[5].score | 0.4913095533847809 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q17096504 |
| concepts[5].display_name | Latency (audio) |
| concepts[6].id | https://openalex.org/C177264268 |
| concepts[6].level | 2 |
| concepts[6].score | 0.4585984945297241 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q1514741 |
| concepts[6].display_name | Set (abstract data type) |
| concepts[7].id | https://openalex.org/C2776836416 |
| concepts[7].level | 2 |
| concepts[7].score | 0.45825234055519104 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q1364844 |
| concepts[7].display_name | False alarm |
| concepts[8].id | https://openalex.org/C57273362 |
| concepts[8].level | 2 |
| concepts[8].score | 0.4573310315608978 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q576722 |
| concepts[8].display_name | Decoding methods |
| concepts[9].id | https://openalex.org/C74296488 |
| concepts[9].level | 2 |
| concepts[9].score | 0.43564122915267944 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q2527392 |
| concepts[9].display_name | End-to-end principle |
| concepts[10].id | https://openalex.org/C2779119184 |
| concepts[10].level | 2 |
| concepts[10].score | 0.4156178832054138 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q294350 |
| concepts[10].display_name | ALARM |
| concepts[11].id | https://openalex.org/C154945302 |
| concepts[11].level | 1 |
| concepts[11].score | 0.33117783069610596 |
| concepts[11].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[11].display_name | Artificial intelligence |
| concepts[12].id | https://openalex.org/C11413529 |
| concepts[12].level | 1 |
| concepts[12].score | 0.10815045237541199 |
| concepts[12].wikidata | https://www.wikidata.org/wiki/Q8366 |
| concepts[12].display_name | Algorithm |
| concepts[13].id | https://openalex.org/C76155785 |
| concepts[13].level | 1 |
| concepts[13].score | 0.07899779081344604 |
| concepts[13].wikidata | https://www.wikidata.org/wiki/Q418 |
| concepts[13].display_name | Telecommunications |
| concepts[14].id | https://openalex.org/C162853370 |
| concepts[14].level | 1 |
| concepts[14].score | 0.0 |
| concepts[14].wikidata | https://www.wikidata.org/wiki/Q39809 |
| concepts[14].display_name | Marketing |
| concepts[15].id | https://openalex.org/C192562407 |
| concepts[15].level | 0 |
| concepts[15].score | 0.0 |
| concepts[15].wikidata | https://www.wikidata.org/wiki/Q228736 |
| concepts[15].display_name | Materials science |
| concepts[16].id | https://openalex.org/C159985019 |
| concepts[16].level | 1 |
| concepts[16].score | 0.0 |
| concepts[16].wikidata | https://www.wikidata.org/wiki/Q181790 |
| concepts[16].display_name | Composite material |
| concepts[17].id | https://openalex.org/C144133560 |
| concepts[17].level | 0 |
| concepts[17].score | 0.0 |
| concepts[17].wikidata | https://www.wikidata.org/wiki/Q4830453 |
| concepts[17].display_name | Business |
| concepts[18].id | https://openalex.org/C199360897 |
| concepts[18].level | 1 |
| concepts[18].score | 0.0 |
| concepts[18].wikidata | https://www.wikidata.org/wiki/Q9143 |
| concepts[18].display_name | Programming language |
| keywords[0].id | https://openalex.org/keywords/computer-science |
| keywords[0].score | 0.7312777042388916 |
| keywords[0].display_name | Computer science |
| keywords[1].id | https://openalex.org/keywords/speech-recognition |
| keywords[1].score | 0.6845752596855164 |
| keywords[1].display_name | Speech recognition |
| keywords[2].id | https://openalex.org/keywords/segmentation |
| keywords[2].score | 0.6802265644073486 |
| keywords[2].display_name | Segmentation |
| keywords[3].id | https://openalex.org/keywords/sentence |
| keywords[3].score | 0.5759625434875488 |
| keywords[3].display_name | Sentence |
| keywords[4].id | https://openalex.org/keywords/market-segmentation |
| keywords[4].score | 0.5214880704879761 |
| keywords[4].display_name | Market segmentation |
| keywords[5].id | https://openalex.org/keywords/latency |
| keywords[5].score | 0.4913095533847809 |
| keywords[5].display_name | Latency (audio) |
| keywords[6].id | https://openalex.org/keywords/set |
| keywords[6].score | 0.4585984945297241 |
| keywords[6].display_name | Set (abstract data type) |
| keywords[7].id | https://openalex.org/keywords/false-alarm |
| keywords[7].score | 0.45825234055519104 |
| keywords[7].display_name | False alarm |
| keywords[8].id | https://openalex.org/keywords/decoding-methods |
| keywords[8].score | 0.4573310315608978 |
| keywords[8].display_name | Decoding methods |
| keywords[9].id | https://openalex.org/keywords/end-to-end-principle |
| keywords[9].score | 0.43564122915267944 |
| keywords[9].display_name | End-to-end principle |
| keywords[10].id | https://openalex.org/keywords/alarm |
| keywords[10].score | 0.4156178832054138 |
| keywords[10].display_name | ALARM |
| keywords[11].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[11].score | 0.33117783069610596 |
| keywords[11].display_name | Artificial intelligence |
| keywords[12].id | https://openalex.org/keywords/algorithm |
| keywords[12].score | 0.10815045237541199 |
| keywords[12].display_name | Algorithm |
| keywords[13].id | https://openalex.org/keywords/telecommunications |
| keywords[13].score | 0.07899779081344604 |
| keywords[13].display_name | Telecommunications |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2204.10749 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | cc-by |
| locations[0].pdf_url | https://arxiv.org/pdf/2204.10749 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | https://openalex.org/licenses/cc-by |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2204.10749 |
| locations[1].id | doi:10.48550/arxiv.2204.10749 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | cc-by |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | https://openalex.org/licenses/cc-by |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2204.10749 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5091738469 |
| authorships[0].author.orcid | |
| authorships[0].author.display_name | W. Ronny Huang |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Huang, W. Ronny |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5001306222 |
| authorships[1].author.orcid | |
| authorships[1].author.display_name | Shuo-Yiin Chang |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Chang, Shuo-yiin |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5050133412 |
| authorships[2].author.orcid | |
| authorships[2].author.display_name | David Rybach |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Rybach, David |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5032640894 |
| authorships[3].author.orcid | https://orcid.org/0000-0001-5331-6058 |
| authorships[3].author.display_name | Rohit Prabhavalkar |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Prabhavalkar, Rohit |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5070513394 |
| authorships[4].author.orcid | https://orcid.org/0000-0002-4126-6556 |
| authorships[4].author.display_name | Tara N. Sainath |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Sainath, Tara N. |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A5030888546 |
| authorships[5].author.orcid | |
| authorships[5].author.display_name | Cyril Allauzen |
| authorships[5].author_position | middle |
| authorships[5].raw_author_name | Allauzen, Cyril |
| authorships[5].is_corresponding | False |
| authorships[6].author.id | https://openalex.org/A5037066965 |
| authorships[6].author.orcid | |
| authorships[6].author.display_name | Cal Peyser |
| authorships[6].author_position | middle |
| authorships[6].raw_author_name | Peyser, Cal |
| authorships[6].is_corresponding | False |
| authorships[7].author.id | https://openalex.org/A5039693533 |
| authorships[7].author.orcid | https://orcid.org/0000-0002-1733-4061 |
| authorships[7].author.display_name | Zhiyun Lu |
| authorships[7].author_position | last |
| authorships[7].raw_author_name | Lu, Zhiyun |
| authorships[7].is_corresponding | False |
| has_content.pdf | True |
| has_content.grobid_xml | True |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2204.10749 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | E2E Segmenter: Joint Segmenting and Decoding for Long-Form ASR |
| has_fulltext | True |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10201 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9998000264167786 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1702 |
| primary_topic.subfield.display_name | Artificial Intelligence |
| primary_topic.display_name | Speech Recognition and Synthesis |
| related_works | https://openalex.org/W1584123598, https://openalex.org/W2731305060, https://openalex.org/W2372003537, https://openalex.org/W3179968364, https://openalex.org/W2732807254, https://openalex.org/W2587670262, https://openalex.org/W3037375888, https://openalex.org/W2366730739, https://openalex.org/W3121346907, https://openalex.org/W4379535633 |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2204.10749 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | cc-by |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2204.10749 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | https://openalex.org/licenses/cc-by |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2204.10749 |
| primary_location.id | pmh:oai:arXiv.org:2204.10749 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | cc-by |
| primary_location.pdf_url | https://arxiv.org/pdf/2204.10749 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | https://openalex.org/licenses/cc-by |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2204.10749 |
| publication_date | 2022-04-22 |
| publication_year | 2022 |
| referenced_works_count | 0 |
| abstract_inverted_index.5 | 83 |
| abstract_inverted_index.A | 24 |
| abstract_inverted_index.a | 35, 63, 71, 102, 167 |
| abstract_inverted_index.30 | 144 |
| abstract_inverted_index.In | 131 |
| abstract_inverted_index.We | 85 |
| abstract_inverted_index.an | 18, 80, 92 |
| abstract_inverted_index.as | 70 |
| abstract_inverted_index.be | 56, 68, 110 |
| abstract_inverted_index.in | 15, 21, 32, 76, 101, 156 |
| abstract_inverted_index.is | 17, 27 |
| abstract_inverted_index.ms | 154 |
| abstract_inverted_index.of | 3, 97, 141 |
| abstract_inverted_index.on | 7, 48, 114, 120, 133, 166 |
| abstract_inverted_index.to | 13, 28, 87, 109, 143, 161 |
| abstract_inverted_index.up | 142 |
| abstract_inverted_index.we | 146 |
| abstract_inverted_index.250 | 153 |
| abstract_inverted_index.ASR | 5, 94 |
| abstract_inverted_index.VAD | 52, 90, 163 |
| abstract_inverted_index.WER | 150 |
| abstract_inverted_index.and | 152 |
| abstract_inverted_index.but | 118 |
| abstract_inverted_index.for | 58 |
| abstract_inverted_index.may | 55, 73 |
| abstract_inverted_index.not | 112 |
| abstract_inverted_index.the | 1, 30, 77, 89, 106, 124, 162 |
| abstract_inverted_index.8.5% | 148 |
| abstract_inverted_index.also | 119 |
| abstract_inverted_index.from | 11, 123 |
| abstract_inverted_index.long | 8 |
| abstract_inverted_index.only | 113 |
| abstract_inverted_index.real | 134 |
| abstract_inverted_index.text | 126 |
| abstract_inverted_index.that | 41, 66 |
| abstract_inverted_index.with | 91, 127, 139 |
| abstract_inverted_index.("set | 79 |
| abstract_inverted_index.(VAD) | 40 |
| abstract_inverted_index.RNN-T | 170 |
| abstract_inverted_index.alarm | 81 |
| abstract_inverted_index.audio | 31, 137 |
| abstract_inverted_index.based | 46 |
| abstract_inverted_index.e.g., | 62 |
| abstract_inverted_index.extra | 129 |
| abstract_inverted_index.hours | 14 |
| abstract_inverted_index.model | 95 |
| abstract_inverted_index.taken | 69 |
| abstract_inverted_index.using | 34 |
| abstract_inverted_index.voice | 37 |
| abstract_inverted_index.whole | 72 |
| abstract_inverted_index.world | 135 |
| abstract_inverted_index.better | 115 |
| abstract_inverted_index.common | 25 |
| abstract_inverted_index.for... | 82 |
| abstract_inverted_index.length | 16 |
| abstract_inverted_index.median | 157 |
| abstract_inverted_index.middle | 78 |
| abstract_inverted_index.model. | 171 |
| abstract_inverted_index.models | 6 |
| abstract_inverted_index.purely | 47 |
| abstract_inverted_index.should | 67 |
| abstract_inverted_index.speech | 22, 60 |
| abstract_inverted_index.where, | 61 |
| abstract_inverted_index.advance | 33 |
| abstract_inverted_index.capable | 96 |
| abstract_inverted_index.contain | 74 |
| abstract_inverted_index.decides | 42 |
| abstract_inverted_index.decoded | 125 |
| abstract_inverted_index.latency | 159 |
| abstract_inverted_index.lengths | 140 |
| abstract_inverted_index.minutes | 12 |
| abstract_inverted_index.ongoing | 19 |
| abstract_inverted_index.propose | 86 |
| abstract_inverted_index.ranging | 10 |
| abstract_inverted_index.replace | 88 |
| abstract_inverted_index.segment | 29, 43, 99 |
| abstract_inverted_index.acoustic | 49, 116 |
| abstract_inverted_index.activity | 38 |
| abstract_inverted_index.allowing | 105 |
| abstract_inverted_index.baseline | 165 |
| abstract_inverted_index.boundary | 44 |
| abstract_inverted_index.compared | 160 |
| abstract_inverted_index.complete | 64 |
| abstract_inverted_index.decision | 108 |
| abstract_inverted_index.detector | 39 |
| abstract_inverted_index.fashion, | 104 |
| abstract_inverted_index.features | 117, 122 |
| abstract_inverted_index.however, | 54 |
| abstract_inverted_index.minutes, | 145 |
| abstract_inverted_index.relative | 149 |
| abstract_inverted_index.semantic | 121 |
| abstract_inverted_index.sentence | 65 |
| abstract_inverted_index.separate | 36 |
| abstract_inverted_index.solution | 26 |
| abstract_inverted_index.(YouTube) | 138 |
| abstract_inverted_index.Conformer | 169 |
| abstract_inverted_index.Improving | 0 |
| abstract_inverted_index.challenge | 20 |
| abstract_inverted_index.locations | 45 |
| abstract_inverted_index.long-form | 136 |
| abstract_inverted_index.reduction | 155 |
| abstract_inverted_index.segmenter | 164 |
| abstract_inverted_index.streaming | 103 |
| abstract_inverted_index.boundaries | 100 |
| abstract_inverted_index.end-to-end | 4, 93 |
| abstract_inverted_index.negligible | 128 |
| abstract_inverted_index.o'clock"). | 84 |
| abstract_inverted_index.predicting | 98 |
| abstract_inverted_index.real-world | 59 |
| abstract_inverted_index.utterances | 9 |
| abstract_inverted_index.conditioned | 111 |
| abstract_inverted_index.demonstrate | 147 |
| abstract_inverted_index.experiments | 132 |
| abstract_inverted_index.hesitations | 75 |
| abstract_inverted_index.improvement | 151 |
| abstract_inverted_index.performance | 2 |
| abstract_inverted_index.segmenters, | 53 |
| abstract_inverted_index.sub-optimal | 57 |
| abstract_inverted_index.computation. | 130 |
| abstract_inverted_index.information. | 51 |
| abstract_inverted_index.recognition. | 23 |
| abstract_inverted_index.segmentation | 107 |
| abstract_inverted_index.end-of-segment | 158 |
| abstract_inverted_index.state-of-the-art | 168 |
| abstract_inverted_index.speech/non-speech | 50 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 8 |
| sustainable_development_goals[0].id | https://metadata.un.org/sdg/16 |
| sustainable_development_goals[0].score | 0.7699999809265137 |
| sustainable_development_goals[0].display_name | Peace, Justice and strong institutions |
| citation_normalized_percentile |