Improving Joint Speech-Text Representations Without Alignment Article Swipe
YOU?
·
· 2023
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2308.06125
The last year has seen astonishing progress in text-prompted image generation premised on the idea of a cross-modal representation space in which the text and image domains are represented jointly. In ASR, this idea has found application as joint speech-text encoders that can scale to the capacities of very large parameter models by being trained on both unpaired speech and text. While these methods show promise, they have required special treatment of the sequence-length mismatch inherent in speech and text, either by up-sampling heuristics or an explicit alignment model. In this work, we offer evidence that joint speech-text encoders naturally achieve consistent representations across modalities by disregarding sequence length, and argue that consistency losses could forgive length differences and simply assume the best alignment. We show that such a loss improves downstream WER in both a large-parameter monolingual and multilingual system.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2308.06125
- https://arxiv.org/pdf/2308.06125
- OA Status
- green
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4385825615
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4385825615Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2308.06125Digital Object Identifier
- Title
-
Improving Joint Speech-Text Representations Without AlignmentWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2023Year of publication
- Publication date
-
2023-08-11Full publication date if available
- Authors
-
Cal Peyser, Zhong Meng, Ke Hu, Rohit Prabhavalkar, Andrew E. Rosenberg, Tara N. Sainath, Michael Picheny, Kyunghyun ChoList of authors in order
- Landing page
-
https://arxiv.org/abs/2308.06125Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2308.06125Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2308.06125Direct OA link when available
- Concepts
-
Computer science, Encoder, Heuristics, Speech recognition, Joint (building), Representation (politics), Sequence (biology), Consistency (knowledge bases), Sampling (signal processing), Natural language processing, Artificial intelligence, Computer vision, Filter (signal processing), Law, Political science, Engineering, Operating system, Genetics, Biology, Politics, Architectural engineeringTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4385825615 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2308.06125 |
| ids.doi | https://doi.org/10.48550/arxiv.2308.06125 |
| ids.openalex | https://openalex.org/W4385825615 |
| fwci | 0.0 |
| type | preprint |
| title | Improving Joint Speech-Text Representations Without Alignment |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T11714 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9975000023841858 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1707 |
| topics[0].subfield.display_name | Computer Vision and Pattern Recognition |
| topics[0].display_name | Multimodal Machine Learning Applications |
| topics[1].id | https://openalex.org/T10201 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9918000102043152 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1702 |
| topics[1].subfield.display_name | Artificial Intelligence |
| topics[1].display_name | Speech Recognition and Synthesis |
| topics[2].id | https://openalex.org/T10860 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.9872000217437744 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1711 |
| topics[2].subfield.display_name | Signal Processing |
| topics[2].display_name | Speech and Audio Processing |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C41008148 |
| concepts[0].level | 0 |
| concepts[0].score | 0.7180798053741455 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[0].display_name | Computer science |
| concepts[1].id | https://openalex.org/C118505674 |
| concepts[1].level | 2 |
| concepts[1].score | 0.6789342761039734 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q42586063 |
| concepts[1].display_name | Encoder |
| concepts[2].id | https://openalex.org/C127705205 |
| concepts[2].level | 2 |
| concepts[2].score | 0.6676328778266907 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q5748245 |
| concepts[2].display_name | Heuristics |
| concepts[3].id | https://openalex.org/C28490314 |
| concepts[3].level | 1 |
| concepts[3].score | 0.6250148415565491 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q189436 |
| concepts[3].display_name | Speech recognition |
| concepts[4].id | https://openalex.org/C18555067 |
| concepts[4].level | 2 |
| concepts[4].score | 0.5762699842453003 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q8375051 |
| concepts[4].display_name | Joint (building) |
| concepts[5].id | https://openalex.org/C2776359362 |
| concepts[5].level | 3 |
| concepts[5].score | 0.5183683037757874 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q2145286 |
| concepts[5].display_name | Representation (politics) |
| concepts[6].id | https://openalex.org/C2778112365 |
| concepts[6].level | 2 |
| concepts[6].score | 0.5054285526275635 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q3511065 |
| concepts[6].display_name | Sequence (biology) |
| concepts[7].id | https://openalex.org/C2776436953 |
| concepts[7].level | 2 |
| concepts[7].score | 0.4518173038959503 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q5163215 |
| concepts[7].display_name | Consistency (knowledge bases) |
| concepts[8].id | https://openalex.org/C140779682 |
| concepts[8].level | 3 |
| concepts[8].score | 0.4341930150985718 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q210868 |
| concepts[8].display_name | Sampling (signal processing) |
| concepts[9].id | https://openalex.org/C204321447 |
| concepts[9].level | 1 |
| concepts[9].score | 0.42132604122161865 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q30642 |
| concepts[9].display_name | Natural language processing |
| concepts[10].id | https://openalex.org/C154945302 |
| concepts[10].level | 1 |
| concepts[10].score | 0.4194837510585785 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[10].display_name | Artificial intelligence |
| concepts[11].id | https://openalex.org/C31972630 |
| concepts[11].level | 1 |
| concepts[11].score | 0.13222596049308777 |
| concepts[11].wikidata | https://www.wikidata.org/wiki/Q844240 |
| concepts[11].display_name | Computer vision |
| concepts[12].id | https://openalex.org/C106131492 |
| concepts[12].level | 2 |
| concepts[12].score | 0.0 |
| concepts[12].wikidata | https://www.wikidata.org/wiki/Q3072260 |
| concepts[12].display_name | Filter (signal processing) |
| concepts[13].id | https://openalex.org/C199539241 |
| concepts[13].level | 1 |
| concepts[13].score | 0.0 |
| concepts[13].wikidata | https://www.wikidata.org/wiki/Q7748 |
| concepts[13].display_name | Law |
| concepts[14].id | https://openalex.org/C17744445 |
| concepts[14].level | 0 |
| concepts[14].score | 0.0 |
| concepts[14].wikidata | https://www.wikidata.org/wiki/Q36442 |
| concepts[14].display_name | Political science |
| concepts[15].id | https://openalex.org/C127413603 |
| concepts[15].level | 0 |
| concepts[15].score | 0.0 |
| concepts[15].wikidata | https://www.wikidata.org/wiki/Q11023 |
| concepts[15].display_name | Engineering |
| concepts[16].id | https://openalex.org/C111919701 |
| concepts[16].level | 1 |
| concepts[16].score | 0.0 |
| concepts[16].wikidata | https://www.wikidata.org/wiki/Q9135 |
| concepts[16].display_name | Operating system |
| concepts[17].id | https://openalex.org/C54355233 |
| concepts[17].level | 1 |
| concepts[17].score | 0.0 |
| concepts[17].wikidata | https://www.wikidata.org/wiki/Q7162 |
| concepts[17].display_name | Genetics |
| concepts[18].id | https://openalex.org/C86803240 |
| concepts[18].level | 0 |
| concepts[18].score | 0.0 |
| concepts[18].wikidata | https://www.wikidata.org/wiki/Q420 |
| concepts[18].display_name | Biology |
| concepts[19].id | https://openalex.org/C94625758 |
| concepts[19].level | 2 |
| concepts[19].score | 0.0 |
| concepts[19].wikidata | https://www.wikidata.org/wiki/Q7163 |
| concepts[19].display_name | Politics |
| concepts[20].id | https://openalex.org/C170154142 |
| concepts[20].level | 1 |
| concepts[20].score | 0.0 |
| concepts[20].wikidata | https://www.wikidata.org/wiki/Q150737 |
| concepts[20].display_name | Architectural engineering |
| keywords[0].id | https://openalex.org/keywords/computer-science |
| keywords[0].score | 0.7180798053741455 |
| keywords[0].display_name | Computer science |
| keywords[1].id | https://openalex.org/keywords/encoder |
| keywords[1].score | 0.6789342761039734 |
| keywords[1].display_name | Encoder |
| keywords[2].id | https://openalex.org/keywords/heuristics |
| keywords[2].score | 0.6676328778266907 |
| keywords[2].display_name | Heuristics |
| keywords[3].id | https://openalex.org/keywords/speech-recognition |
| keywords[3].score | 0.6250148415565491 |
| keywords[3].display_name | Speech recognition |
| keywords[4].id | https://openalex.org/keywords/joint |
| keywords[4].score | 0.5762699842453003 |
| keywords[4].display_name | Joint (building) |
| keywords[5].id | https://openalex.org/keywords/representation |
| keywords[5].score | 0.5183683037757874 |
| keywords[5].display_name | Representation (politics) |
| keywords[6].id | https://openalex.org/keywords/sequence |
| keywords[6].score | 0.5054285526275635 |
| keywords[6].display_name | Sequence (biology) |
| keywords[7].id | https://openalex.org/keywords/consistency |
| keywords[7].score | 0.4518173038959503 |
| keywords[7].display_name | Consistency (knowledge bases) |
| keywords[8].id | https://openalex.org/keywords/sampling |
| keywords[8].score | 0.4341930150985718 |
| keywords[8].display_name | Sampling (signal processing) |
| keywords[9].id | https://openalex.org/keywords/natural-language-processing |
| keywords[9].score | 0.42132604122161865 |
| keywords[9].display_name | Natural language processing |
| keywords[10].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[10].score | 0.4194837510585785 |
| keywords[10].display_name | Artificial intelligence |
| keywords[11].id | https://openalex.org/keywords/computer-vision |
| keywords[11].score | 0.13222596049308777 |
| keywords[11].display_name | Computer vision |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2308.06125 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | cc-by |
| locations[0].pdf_url | https://arxiv.org/pdf/2308.06125 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | https://openalex.org/licenses/cc-by |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2308.06125 |
| locations[1].id | doi:10.48550/arxiv.2308.06125 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | cc-by |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article-journal |
| locations[1].license_id | https://openalex.org/licenses/cc-by |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2308.06125 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5037066965 |
| authorships[0].author.orcid | |
| authorships[0].author.display_name | Cal Peyser |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Peyser, Cal |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5101749753 |
| authorships[1].author.orcid | https://orcid.org/0000-0001-7814-5929 |
| authorships[1].author.display_name | Zhong Meng |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Meng, Zhong |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5029338576 |
| authorships[2].author.orcid | https://orcid.org/0000-0002-1599-1519 |
| authorships[2].author.display_name | Ke Hu |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Hu, Ke |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5032640894 |
| authorships[3].author.orcid | https://orcid.org/0000-0001-5331-6058 |
| authorships[3].author.display_name | Rohit Prabhavalkar |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Prabhavalkar, Rohit |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5024913801 |
| authorships[4].author.orcid | https://orcid.org/0000-0002-4536-8824 |
| authorships[4].author.display_name | Andrew E. Rosenberg |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Rosenberg, Andrew |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A5070513394 |
| authorships[5].author.orcid | https://orcid.org/0000-0002-4126-6556 |
| authorships[5].author.display_name | Tara N. Sainath |
| authorships[5].author_position | middle |
| authorships[5].raw_author_name | Sainath, Tara N. |
| authorships[5].is_corresponding | False |
| authorships[6].author.id | https://openalex.org/A5034529775 |
| authorships[6].author.orcid | |
| authorships[6].author.display_name | Michael Picheny |
| authorships[6].author_position | middle |
| authorships[6].raw_author_name | Picheny, Michael |
| authorships[6].is_corresponding | False |
| authorships[7].author.id | https://openalex.org/A5091175785 |
| authorships[7].author.orcid | https://orcid.org/0000-0003-1669-3211 |
| authorships[7].author.display_name | Kyunghyun Cho |
| authorships[7].author_position | last |
| authorships[7].raw_author_name | Cho, Kyunghyun |
| authorships[7].is_corresponding | False |
| has_content.pdf | True |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2308.06125 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Improving Joint Speech-Text Representations Without Alignment |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T11714 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9975000023841858 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1707 |
| primary_topic.subfield.display_name | Computer Vision and Pattern Recognition |
| primary_topic.display_name | Multimodal Machine Learning Applications |
| related_works | https://openalex.org/W2280422768, https://openalex.org/W3143197806, https://openalex.org/W4252555497, https://openalex.org/W3121175838, https://openalex.org/W3016293053, https://openalex.org/W1690653314, https://openalex.org/W2401723157, https://openalex.org/W2065055572, https://openalex.org/W2784269775, https://openalex.org/W2952904874 |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2308.06125 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | cc-by |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2308.06125 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | https://openalex.org/licenses/cc-by |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2308.06125 |
| primary_location.id | pmh:oai:arXiv.org:2308.06125 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | cc-by |
| primary_location.pdf_url | https://arxiv.org/pdf/2308.06125 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | https://openalex.org/licenses/cc-by |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2308.06125 |
| publication_date | 2023-08-11 |
| publication_year | 2023 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 16, 128, 135 |
| abstract_inverted_index.In | 30, 89 |
| abstract_inverted_index.We | 124 |
| abstract_inverted_index.an | 85 |
| abstract_inverted_index.as | 37 |
| abstract_inverted_index.by | 52, 81, 105 |
| abstract_inverted_index.in | 7, 20, 76, 133 |
| abstract_inverted_index.of | 15, 47, 71 |
| abstract_inverted_index.on | 12, 55 |
| abstract_inverted_index.or | 84 |
| abstract_inverted_index.to | 44 |
| abstract_inverted_index.we | 92 |
| abstract_inverted_index.The | 0 |
| abstract_inverted_index.WER | 132 |
| abstract_inverted_index.and | 24, 59, 78, 109, 118, 138 |
| abstract_inverted_index.are | 27 |
| abstract_inverted_index.can | 42 |
| abstract_inverted_index.has | 3, 34 |
| abstract_inverted_index.the | 13, 22, 45, 72, 121 |
| abstract_inverted_index.ASR, | 31 |
| abstract_inverted_index.best | 122 |
| abstract_inverted_index.both | 56, 134 |
| abstract_inverted_index.have | 67 |
| abstract_inverted_index.idea | 14, 33 |
| abstract_inverted_index.last | 1 |
| abstract_inverted_index.loss | 129 |
| abstract_inverted_index.seen | 4 |
| abstract_inverted_index.show | 64, 125 |
| abstract_inverted_index.such | 127 |
| abstract_inverted_index.text | 23 |
| abstract_inverted_index.that | 41, 95, 111, 126 |
| abstract_inverted_index.they | 66 |
| abstract_inverted_index.this | 32, 90 |
| abstract_inverted_index.very | 48 |
| abstract_inverted_index.year | 2 |
| abstract_inverted_index.While | 61 |
| abstract_inverted_index.argue | 110 |
| abstract_inverted_index.being | 53 |
| abstract_inverted_index.could | 114 |
| abstract_inverted_index.found | 35 |
| abstract_inverted_index.image | 9, 25 |
| abstract_inverted_index.joint | 38, 96 |
| abstract_inverted_index.large | 49 |
| abstract_inverted_index.offer | 93 |
| abstract_inverted_index.scale | 43 |
| abstract_inverted_index.space | 19 |
| abstract_inverted_index.text, | 79 |
| abstract_inverted_index.text. | 60 |
| abstract_inverted_index.these | 62 |
| abstract_inverted_index.which | 21 |
| abstract_inverted_index.work, | 91 |
| abstract_inverted_index.across | 103 |
| abstract_inverted_index.assume | 120 |
| abstract_inverted_index.either | 80 |
| abstract_inverted_index.length | 116 |
| abstract_inverted_index.losses | 113 |
| abstract_inverted_index.model. | 88 |
| abstract_inverted_index.models | 51 |
| abstract_inverted_index.simply | 119 |
| abstract_inverted_index.speech | 58, 77 |
| abstract_inverted_index.achieve | 100 |
| abstract_inverted_index.domains | 26 |
| abstract_inverted_index.forgive | 115 |
| abstract_inverted_index.length, | 108 |
| abstract_inverted_index.methods | 63 |
| abstract_inverted_index.special | 69 |
| abstract_inverted_index.system. | 140 |
| abstract_inverted_index.trained | 54 |
| abstract_inverted_index.encoders | 40, 98 |
| abstract_inverted_index.evidence | 94 |
| abstract_inverted_index.explicit | 86 |
| abstract_inverted_index.improves | 130 |
| abstract_inverted_index.inherent | 75 |
| abstract_inverted_index.jointly. | 29 |
| abstract_inverted_index.mismatch | 74 |
| abstract_inverted_index.premised | 11 |
| abstract_inverted_index.progress | 6 |
| abstract_inverted_index.promise, | 65 |
| abstract_inverted_index.required | 68 |
| abstract_inverted_index.sequence | 107 |
| abstract_inverted_index.unpaired | 57 |
| abstract_inverted_index.alignment | 87 |
| abstract_inverted_index.naturally | 99 |
| abstract_inverted_index.parameter | 50 |
| abstract_inverted_index.treatment | 70 |
| abstract_inverted_index.alignment. | 123 |
| abstract_inverted_index.capacities | 46 |
| abstract_inverted_index.consistent | 101 |
| abstract_inverted_index.downstream | 131 |
| abstract_inverted_index.generation | 10 |
| abstract_inverted_index.heuristics | 83 |
| abstract_inverted_index.modalities | 104 |
| abstract_inverted_index.application | 36 |
| abstract_inverted_index.astonishing | 5 |
| abstract_inverted_index.consistency | 112 |
| abstract_inverted_index.cross-modal | 17 |
| abstract_inverted_index.differences | 117 |
| abstract_inverted_index.monolingual | 137 |
| abstract_inverted_index.represented | 28 |
| abstract_inverted_index.speech-text | 39, 97 |
| abstract_inverted_index.up-sampling | 82 |
| abstract_inverted_index.disregarding | 106 |
| abstract_inverted_index.multilingual | 139 |
| abstract_inverted_index.text-prompted | 8 |
| abstract_inverted_index.representation | 18 |
| abstract_inverted_index.large-parameter | 136 |
| abstract_inverted_index.representations | 102 |
| abstract_inverted_index.sequence-length | 73 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 8 |
| citation_normalized_percentile.value | 0.10745584 |
| citation_normalized_percentile.is_in_top_1_percent | False |
| citation_normalized_percentile.is_in_top_10_percent | False |