Prompting Large Language Models with Speech Recognition Abilities Article Swipe
YOU?
·
· 2023
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2307.11795
Large language models have proven themselves highly flexible, able to solve a wide range of generative tasks, such as abstractive summarization and open-ended question answering. In this paper we extend the capabilities of LLMs by directly attaching a small audio encoder allowing it to perform speech recognition. By directly prepending a sequence of audial embeddings to the text token embeddings, the LLM can be converted to an automatic speech recognition (ASR) system, and be used in the exact same manner as its textual counterpart. Experiments on Multilingual LibriSpeech (MLS) show that incorporating a conformer encoder into the open sourced LLaMA-7B allows it to outperform monolingual baselines by 18% and perform multilingual speech recognition despite LLaMA being trained overwhelmingly on English text. Furthermore, we perform ablation studies to investigate whether the LLM can be completely frozen during training to maintain its original capabilities, scaling up the audio encoder, and increasing the audio encoder striding to generate fewer embeddings. The results from these studies show that multilingual ASR is possible even when the LLM is frozen or when strides of almost 1 second are used in the audio encoder opening up the possibility for LLMs to operate on long-form audio.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2307.11795
- https://arxiv.org/pdf/2307.11795
- OA Status
- green
- Cited By
- 2
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4385260920
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4385260920Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2307.11795Digital Object Identifier
- Title
-
Prompting Large Language Models with Speech Recognition AbilitiesWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2023Year of publication
- Publication date
-
2023-07-21Full publication date if available
- Authors
-
Yassir Fathullah, Chunyang Wu, Egor Lakomkin, Junteng Jia, Yuan Shangguan, Ke Li, Jinxi Guo, Wenhan Xiong, Jay Mahadeokar, Ozlem Kalinli, Christian Fuegen, Mike SeltzerList of authors in order
- Landing page
-
https://arxiv.org/abs/2307.11795Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2307.11795Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2307.11795Direct OA link when available
- Concepts
-
Computer science, Encoder, Speech recognition, Automatic summarization, Security token, Language model, Natural language processing, Acoustic model, Artificial intelligence, Speech processing, Operating system, Computer securityTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
2Total citation count in OpenAlex
- Citations by year (recent)
-
2025: 1, 2024: 1Per-year citation counts (last 5 years)
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4385260920 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2307.11795 |
| ids.doi | https://doi.org/10.48550/arxiv.2307.11795 |
| ids.openalex | https://openalex.org/W4385260920 |
| fwci | |
| type | preprint |
| title | Prompting Large Language Models with Speech Recognition Abilities |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10201 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9987000226974487 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1702 |
| topics[0].subfield.display_name | Artificial Intelligence |
| topics[0].display_name | Speech Recognition and Synthesis |
| topics[1].id | https://openalex.org/T10028 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9987000226974487 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1702 |
| topics[1].subfield.display_name | Artificial Intelligence |
| topics[1].display_name | Topic Modeling |
| topics[2].id | https://openalex.org/T10181 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.9975000023841858 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1702 |
| topics[2].subfield.display_name | Artificial Intelligence |
| topics[2].display_name | Natural Language Processing Techniques |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C41008148 |
| concepts[0].level | 0 |
| concepts[0].score | 0.7731028199195862 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[0].display_name | Computer science |
| concepts[1].id | https://openalex.org/C118505674 |
| concepts[1].level | 2 |
| concepts[1].score | 0.735184371471405 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q42586063 |
| concepts[1].display_name | Encoder |
| concepts[2].id | https://openalex.org/C28490314 |
| concepts[2].level | 1 |
| concepts[2].score | 0.6685544848442078 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q189436 |
| concepts[2].display_name | Speech recognition |
| concepts[3].id | https://openalex.org/C170858558 |
| concepts[3].level | 2 |
| concepts[3].score | 0.618772566318512 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q1394144 |
| concepts[3].display_name | Automatic summarization |
| concepts[4].id | https://openalex.org/C48145219 |
| concepts[4].level | 2 |
| concepts[4].score | 0.6089869737625122 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q1335365 |
| concepts[4].display_name | Security token |
| concepts[5].id | https://openalex.org/C137293760 |
| concepts[5].level | 2 |
| concepts[5].score | 0.5435717701911926 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q3621696 |
| concepts[5].display_name | Language model |
| concepts[6].id | https://openalex.org/C204321447 |
| concepts[6].level | 1 |
| concepts[6].score | 0.49280837178230286 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q30642 |
| concepts[6].display_name | Natural language processing |
| concepts[7].id | https://openalex.org/C155635449 |
| concepts[7].level | 3 |
| concepts[7].score | 0.4548299312591553 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q4674699 |
| concepts[7].display_name | Acoustic model |
| concepts[8].id | https://openalex.org/C154945302 |
| concepts[8].level | 1 |
| concepts[8].score | 0.39188697934150696 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[8].display_name | Artificial intelligence |
| concepts[9].id | https://openalex.org/C61328038 |
| concepts[9].level | 2 |
| concepts[9].score | 0.285081684589386 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q3358061 |
| concepts[9].display_name | Speech processing |
| concepts[10].id | https://openalex.org/C111919701 |
| concepts[10].level | 1 |
| concepts[10].score | 0.0 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q9135 |
| concepts[10].display_name | Operating system |
| concepts[11].id | https://openalex.org/C38652104 |
| concepts[11].level | 1 |
| concepts[11].score | 0.0 |
| concepts[11].wikidata | https://www.wikidata.org/wiki/Q3510521 |
| concepts[11].display_name | Computer security |
| keywords[0].id | https://openalex.org/keywords/computer-science |
| keywords[0].score | 0.7731028199195862 |
| keywords[0].display_name | Computer science |
| keywords[1].id | https://openalex.org/keywords/encoder |
| keywords[1].score | 0.735184371471405 |
| keywords[1].display_name | Encoder |
| keywords[2].id | https://openalex.org/keywords/speech-recognition |
| keywords[2].score | 0.6685544848442078 |
| keywords[2].display_name | Speech recognition |
| keywords[3].id | https://openalex.org/keywords/automatic-summarization |
| keywords[3].score | 0.618772566318512 |
| keywords[3].display_name | Automatic summarization |
| keywords[4].id | https://openalex.org/keywords/security-token |
| keywords[4].score | 0.6089869737625122 |
| keywords[4].display_name | Security token |
| keywords[5].id | https://openalex.org/keywords/language-model |
| keywords[5].score | 0.5435717701911926 |
| keywords[5].display_name | Language model |
| keywords[6].id | https://openalex.org/keywords/natural-language-processing |
| keywords[6].score | 0.49280837178230286 |
| keywords[6].display_name | Natural language processing |
| keywords[7].id | https://openalex.org/keywords/acoustic-model |
| keywords[7].score | 0.4548299312591553 |
| keywords[7].display_name | Acoustic model |
| keywords[8].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[8].score | 0.39188697934150696 |
| keywords[8].display_name | Artificial intelligence |
| keywords[9].id | https://openalex.org/keywords/speech-processing |
| keywords[9].score | 0.285081684589386 |
| keywords[9].display_name | Speech processing |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2307.11795 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | other-oa |
| locations[0].pdf_url | https://arxiv.org/pdf/2307.11795 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | https://openalex.org/licenses/other-oa |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2307.11795 |
| locations[1].id | doi:10.48550/arxiv.2307.11795 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2307.11795 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5006814826 |
| authorships[0].author.orcid | |
| authorships[0].author.display_name | Yassir Fathullah |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Fathullah, Yassir |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5103012144 |
| authorships[1].author.orcid | https://orcid.org/0000-0002-5796-8288 |
| authorships[1].author.display_name | Chunyang Wu |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Wu, Chunyang |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5045428440 |
| authorships[2].author.orcid | |
| authorships[2].author.display_name | Egor Lakomkin |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Lakomkin, Egor |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5113970008 |
| authorships[3].author.orcid | |
| authorships[3].author.display_name | Junteng Jia |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Jia, Junteng |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5047358828 |
| authorships[4].author.orcid | |
| authorships[4].author.display_name | Yuan Shangguan |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Shangguan, Yuan |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A5100343555 |
| authorships[5].author.orcid | https://orcid.org/0009-0006-9192-0487 |
| authorships[5].author.display_name | Ke Li |
| authorships[5].author_position | middle |
| authorships[5].raw_author_name | Li, Ke |
| authorships[5].is_corresponding | False |
| authorships[6].author.id | https://openalex.org/A5103232491 |
| authorships[6].author.orcid | https://orcid.org/0000-0001-9563-7351 |
| authorships[6].author.display_name | Jinxi Guo |
| authorships[6].author_position | middle |
| authorships[6].raw_author_name | Guo, Jinxi |
| authorships[6].is_corresponding | False |
| authorships[7].author.id | https://openalex.org/A5110635444 |
| authorships[7].author.orcid | |
| authorships[7].author.display_name | Wenhan Xiong |
| authorships[7].author_position | middle |
| authorships[7].raw_author_name | Xiong, Wenhan |
| authorships[7].is_corresponding | False |
| authorships[8].author.id | https://openalex.org/A5074237839 |
| authorships[8].author.orcid | |
| authorships[8].author.display_name | Jay Mahadeokar |
| authorships[8].author_position | middle |
| authorships[8].raw_author_name | Mahadeokar, Jay |
| authorships[8].is_corresponding | False |
| authorships[9].author.id | https://openalex.org/A5066166549 |
| authorships[9].author.orcid | |
| authorships[9].author.display_name | Ozlem Kalinli |
| authorships[9].author_position | middle |
| authorships[9].raw_author_name | Kalinli, Ozlem |
| authorships[9].is_corresponding | False |
| authorships[10].author.id | https://openalex.org/A5047073253 |
| authorships[10].author.orcid | |
| authorships[10].author.display_name | Christian Fuegen |
| authorships[10].author_position | middle |
| authorships[10].raw_author_name | Fuegen, Christian |
| authorships[10].is_corresponding | False |
| authorships[11].author.id | https://openalex.org/A5113773386 |
| authorships[11].author.orcid | |
| authorships[11].author.display_name | Mike Seltzer |
| authorships[11].author_position | last |
| authorships[11].raw_author_name | Seltzer, Mike |
| authorships[11].is_corresponding | False |
| has_content.pdf | True |
| has_content.grobid_xml | True |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2307.11795 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Prompting Large Language Models with Speech Recognition Abilities |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10201 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9987000226974487 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1702 |
| primary_topic.subfield.display_name | Artificial Intelligence |
| primary_topic.display_name | Speech Recognition and Synthesis |
| related_works | https://openalex.org/W2126322296, https://openalex.org/W2163537793, https://openalex.org/W2916997151, https://openalex.org/W2781555308, https://openalex.org/W3021690593, https://openalex.org/W2125343999, https://openalex.org/W4200200210, https://openalex.org/W2161188302, https://openalex.org/W2888189389, https://openalex.org/W2949174760 |
| cited_by_count | 2 |
| counts_by_year[0].year | 2025 |
| counts_by_year[0].cited_by_count | 1 |
| counts_by_year[1].year | 2024 |
| counts_by_year[1].cited_by_count | 1 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2307.11795 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | other-oa |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2307.11795 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | https://openalex.org/licenses/other-oa |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2307.11795 |
| primary_location.id | pmh:oai:arXiv.org:2307.11795 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | other-oa |
| primary_location.pdf_url | https://arxiv.org/pdf/2307.11795 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | https://openalex.org/licenses/other-oa |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2307.11795 |
| publication_date | 2023-07-21 |
| publication_year | 2023 |
| referenced_works_count | 0 |
| abstract_inverted_index.1 | 179 |
| abstract_inverted_index.a | 11, 37, 50, 92 |
| abstract_inverted_index.By | 47 |
| abstract_inverted_index.In | 25 |
| abstract_inverted_index.an | 66 |
| abstract_inverted_index.as | 18, 80 |
| abstract_inverted_index.be | 63, 73, 132 |
| abstract_inverted_index.by | 34, 106 |
| abstract_inverted_index.in | 75, 183 |
| abstract_inverted_index.is | 166, 172 |
| abstract_inverted_index.it | 42, 101 |
| abstract_inverted_index.of | 14, 32, 52, 177 |
| abstract_inverted_index.on | 85, 118, 195 |
| abstract_inverted_index.or | 174 |
| abstract_inverted_index.to | 9, 43, 55, 65, 102, 126, 137, 153, 193 |
| abstract_inverted_index.up | 143, 188 |
| abstract_inverted_index.we | 28, 122 |
| abstract_inverted_index.18% | 107 |
| abstract_inverted_index.ASR | 165 |
| abstract_inverted_index.LLM | 61, 130, 171 |
| abstract_inverted_index.The | 157 |
| abstract_inverted_index.and | 21, 72, 108, 147 |
| abstract_inverted_index.are | 181 |
| abstract_inverted_index.can | 62, 131 |
| abstract_inverted_index.for | 191 |
| abstract_inverted_index.its | 81, 139 |
| abstract_inverted_index.the | 30, 56, 60, 76, 96, 129, 144, 149, 170, 184, 189 |
| abstract_inverted_index.LLMs | 33, 192 |
| abstract_inverted_index.able | 8 |
| abstract_inverted_index.even | 168 |
| abstract_inverted_index.from | 159 |
| abstract_inverted_index.have | 3 |
| abstract_inverted_index.into | 95 |
| abstract_inverted_index.open | 97 |
| abstract_inverted_index.same | 78 |
| abstract_inverted_index.show | 89, 162 |
| abstract_inverted_index.such | 17 |
| abstract_inverted_index.text | 57 |
| abstract_inverted_index.that | 90, 163 |
| abstract_inverted_index.this | 26 |
| abstract_inverted_index.used | 74, 182 |
| abstract_inverted_index.when | 169, 175 |
| abstract_inverted_index.wide | 12 |
| abstract_inverted_index.(ASR) | 70 |
| abstract_inverted_index.(MLS) | 88 |
| abstract_inverted_index.LLaMA | 114 |
| abstract_inverted_index.Large | 0 |
| abstract_inverted_index.audio | 39, 145, 150, 185 |
| abstract_inverted_index.being | 115 |
| abstract_inverted_index.exact | 77 |
| abstract_inverted_index.fewer | 155 |
| abstract_inverted_index.paper | 27 |
| abstract_inverted_index.range | 13 |
| abstract_inverted_index.small | 38 |
| abstract_inverted_index.solve | 10 |
| abstract_inverted_index.text. | 120 |
| abstract_inverted_index.these | 160 |
| abstract_inverted_index.token | 58 |
| abstract_inverted_index.allows | 100 |
| abstract_inverted_index.almost | 178 |
| abstract_inverted_index.audial | 53 |
| abstract_inverted_index.audio. | 197 |
| abstract_inverted_index.during | 135 |
| abstract_inverted_index.extend | 29 |
| abstract_inverted_index.frozen | 134, 173 |
| abstract_inverted_index.highly | 6 |
| abstract_inverted_index.manner | 79 |
| abstract_inverted_index.models | 2 |
| abstract_inverted_index.proven | 4 |
| abstract_inverted_index.second | 180 |
| abstract_inverted_index.speech | 45, 68, 111 |
| abstract_inverted_index.tasks, | 16 |
| abstract_inverted_index.English | 119 |
| abstract_inverted_index.despite | 113 |
| abstract_inverted_index.encoder | 40, 94, 151, 186 |
| abstract_inverted_index.opening | 187 |
| abstract_inverted_index.operate | 194 |
| abstract_inverted_index.perform | 44, 109, 123 |
| abstract_inverted_index.results | 158 |
| abstract_inverted_index.scaling | 142 |
| abstract_inverted_index.sourced | 98 |
| abstract_inverted_index.strides | 176 |
| abstract_inverted_index.studies | 125, 161 |
| abstract_inverted_index.system, | 71 |
| abstract_inverted_index.textual | 82 |
| abstract_inverted_index.trained | 116 |
| abstract_inverted_index.whether | 128 |
| abstract_inverted_index.LLaMA-7B | 99 |
| abstract_inverted_index.ablation | 124 |
| abstract_inverted_index.allowing | 41 |
| abstract_inverted_index.directly | 35, 48 |
| abstract_inverted_index.encoder, | 146 |
| abstract_inverted_index.generate | 154 |
| abstract_inverted_index.language | 1 |
| abstract_inverted_index.maintain | 138 |
| abstract_inverted_index.original | 140 |
| abstract_inverted_index.possible | 167 |
| abstract_inverted_index.question | 23 |
| abstract_inverted_index.sequence | 51 |
| abstract_inverted_index.striding | 152 |
| abstract_inverted_index.training | 136 |
| abstract_inverted_index.attaching | 36 |
| abstract_inverted_index.automatic | 67 |
| abstract_inverted_index.baselines | 105 |
| abstract_inverted_index.conformer | 93 |
| abstract_inverted_index.converted | 64 |
| abstract_inverted_index.flexible, | 7 |
| abstract_inverted_index.long-form | 196 |
| abstract_inverted_index.answering. | 24 |
| abstract_inverted_index.completely | 133 |
| abstract_inverted_index.embeddings | 54 |
| abstract_inverted_index.generative | 15 |
| abstract_inverted_index.increasing | 148 |
| abstract_inverted_index.open-ended | 22 |
| abstract_inverted_index.outperform | 103 |
| abstract_inverted_index.prepending | 49 |
| abstract_inverted_index.themselves | 5 |
| abstract_inverted_index.Experiments | 84 |
| abstract_inverted_index.LibriSpeech | 87 |
| abstract_inverted_index.abstractive | 19 |
| abstract_inverted_index.embeddings, | 59 |
| abstract_inverted_index.embeddings. | 156 |
| abstract_inverted_index.investigate | 127 |
| abstract_inverted_index.monolingual | 104 |
| abstract_inverted_index.possibility | 190 |
| abstract_inverted_index.recognition | 69, 112 |
| abstract_inverted_index.Furthermore, | 121 |
| abstract_inverted_index.Multilingual | 86 |
| abstract_inverted_index.capabilities | 31 |
| abstract_inverted_index.counterpart. | 83 |
| abstract_inverted_index.multilingual | 110, 164 |
| abstract_inverted_index.recognition. | 46 |
| abstract_inverted_index.capabilities, | 141 |
| abstract_inverted_index.incorporating | 91 |
| abstract_inverted_index.summarization | 20 |
| abstract_inverted_index.overwhelmingly | 117 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 12 |
| sustainable_development_goals[0].id | https://metadata.un.org/sdg/4 |
| sustainable_development_goals[0].score | 0.6200000047683716 |
| sustainable_development_goals[0].display_name | Quality Education |
| citation_normalized_percentile |