SpeechGLUE: How Well Can Self-Supervised Speech Models Capture Linguistic Knowledge? Article Swipe
YOU?
·
· 2023
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2306.08374
Self-supervised learning (SSL) for speech representation has been successfully applied in various downstream tasks, such as speech and speaker recognition. More recently, speech SSL models have also been shown to be beneficial in advancing spoken language understanding tasks, implying that the SSL models have the potential to learn not only acoustic but also linguistic information. In this paper, we aim to clarify if speech SSL techniques can well capture linguistic knowledge. For this purpose, we introduce SpeechGLUE, a speech version of the General Language Understanding Evaluation (GLUE) benchmark. Since GLUE comprises a variety of natural language understanding tasks, SpeechGLUE can elucidate the degree of linguistic ability of speech SSL models. Experiments demonstrate that speech SSL models, although inferior to text-based SSL models, perform better than baselines, suggesting that they can acquire a certain amount of general linguistic knowledge from just unlabeled speech data.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2306.08374
- https://arxiv.org/pdf/2306.08374
- OA Status
- green
- Cited By
- 1
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4380993313
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4380993313Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2306.08374Digital Object Identifier
- Title
-
SpeechGLUE: How Well Can Self-Supervised Speech Models Capture Linguistic Knowledge?Work title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2023Year of publication
- Publication date
-
2023-06-14Full publication date if available
- Authors
-
Takanori Ashihara, Takafumi Moriya, Kohei Matsuura, Tomohiro Tanaka, Yusuke Ijima, Taichi Asami, Marc Delcroix, Yukinori HonmaList of authors in order
- Landing page
-
https://arxiv.org/abs/2306.08374Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2306.08374Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2306.08374Direct OA link when available
- Concepts
-
Computer science, Natural language processing, Variety (cybernetics), Artificial intelligence, Benchmark (surveying), Representation (politics), Speech recognition, Law, Geography, Geodesy, Politics, Political scienceTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
1Total citation count in OpenAlex
- Citations by year (recent)
-
2023: 1Per-year citation counts (last 5 years)
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4380993313 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2306.08374 |
| ids.doi | https://doi.org/10.48550/arxiv.2306.08374 |
| ids.openalex | https://openalex.org/W4380993313 |
| fwci | |
| type | preprint |
| title | SpeechGLUE: How Well Can Self-Supervised Speech Models Capture Linguistic Knowledge? |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10201 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9973000288009644 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1702 |
| topics[0].subfield.display_name | Artificial Intelligence |
| topics[0].display_name | Speech Recognition and Synthesis |
| topics[1].id | https://openalex.org/T10181 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9968000054359436 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1702 |
| topics[1].subfield.display_name | Artificial Intelligence |
| topics[1].display_name | Natural Language Processing Techniques |
| topics[2].id | https://openalex.org/T12031 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.9955999851226807 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1702 |
| topics[2].subfield.display_name | Artificial Intelligence |
| topics[2].display_name | Speech and dialogue systems |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C41008148 |
| concepts[0].level | 0 |
| concepts[0].score | 0.7830330729484558 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[0].display_name | Computer science |
| concepts[1].id | https://openalex.org/C204321447 |
| concepts[1].level | 1 |
| concepts[1].score | 0.6119226813316345 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q30642 |
| concepts[1].display_name | Natural language processing |
| concepts[2].id | https://openalex.org/C136197465 |
| concepts[2].level | 2 |
| concepts[2].score | 0.5732904672622681 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q1729295 |
| concepts[2].display_name | Variety (cybernetics) |
| concepts[3].id | https://openalex.org/C154945302 |
| concepts[3].level | 1 |
| concepts[3].score | 0.46567365527153015 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[3].display_name | Artificial intelligence |
| concepts[4].id | https://openalex.org/C185798385 |
| concepts[4].level | 2 |
| concepts[4].score | 0.4559522271156311 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q1161707 |
| concepts[4].display_name | Benchmark (surveying) |
| concepts[5].id | https://openalex.org/C2776359362 |
| concepts[5].level | 3 |
| concepts[5].score | 0.45503589510917664 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q2145286 |
| concepts[5].display_name | Representation (politics) |
| concepts[6].id | https://openalex.org/C28490314 |
| concepts[6].level | 1 |
| concepts[6].score | 0.36045849323272705 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q189436 |
| concepts[6].display_name | Speech recognition |
| concepts[7].id | https://openalex.org/C199539241 |
| concepts[7].level | 1 |
| concepts[7].score | 0.0 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q7748 |
| concepts[7].display_name | Law |
| concepts[8].id | https://openalex.org/C205649164 |
| concepts[8].level | 0 |
| concepts[8].score | 0.0 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q1071 |
| concepts[8].display_name | Geography |
| concepts[9].id | https://openalex.org/C13280743 |
| concepts[9].level | 1 |
| concepts[9].score | 0.0 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q131089 |
| concepts[9].display_name | Geodesy |
| concepts[10].id | https://openalex.org/C94625758 |
| concepts[10].level | 2 |
| concepts[10].score | 0.0 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q7163 |
| concepts[10].display_name | Politics |
| concepts[11].id | https://openalex.org/C17744445 |
| concepts[11].level | 0 |
| concepts[11].score | 0.0 |
| concepts[11].wikidata | https://www.wikidata.org/wiki/Q36442 |
| concepts[11].display_name | Political science |
| keywords[0].id | https://openalex.org/keywords/computer-science |
| keywords[0].score | 0.7830330729484558 |
| keywords[0].display_name | Computer science |
| keywords[1].id | https://openalex.org/keywords/natural-language-processing |
| keywords[1].score | 0.6119226813316345 |
| keywords[1].display_name | Natural language processing |
| keywords[2].id | https://openalex.org/keywords/variety |
| keywords[2].score | 0.5732904672622681 |
| keywords[2].display_name | Variety (cybernetics) |
| keywords[3].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[3].score | 0.46567365527153015 |
| keywords[3].display_name | Artificial intelligence |
| keywords[4].id | https://openalex.org/keywords/benchmark |
| keywords[4].score | 0.4559522271156311 |
| keywords[4].display_name | Benchmark (surveying) |
| keywords[5].id | https://openalex.org/keywords/representation |
| keywords[5].score | 0.45503589510917664 |
| keywords[5].display_name | Representation (politics) |
| keywords[6].id | https://openalex.org/keywords/speech-recognition |
| keywords[6].score | 0.36045849323272705 |
| keywords[6].display_name | Speech recognition |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2306.08374 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2306.08374 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2306.08374 |
| locations[1].id | doi:10.48550/arxiv.2306.08374 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2306.08374 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5033975068 |
| authorships[0].author.orcid | https://orcid.org/0009-0003-4322-4127 |
| authorships[0].author.display_name | Takanori Ashihara |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Ashihara, Takanori |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5087290011 |
| authorships[1].author.orcid | https://orcid.org/0000-0003-1942-7250 |
| authorships[1].author.display_name | Takafumi Moriya |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Moriya, Takafumi |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5104231303 |
| authorships[2].author.orcid | https://orcid.org/0009-0000-0884-2200 |
| authorships[2].author.display_name | Kohei Matsuura |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Matsuura, Kohei |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5007415728 |
| authorships[3].author.orcid | https://orcid.org/0000-0002-8884-9089 |
| authorships[3].author.display_name | Tomohiro Tanaka |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Tanaka, Tomohiro |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5068604686 |
| authorships[4].author.orcid | |
| authorships[4].author.display_name | Yusuke Ijima |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Ijima, Yusuke |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A5112536171 |
| authorships[5].author.orcid | |
| authorships[5].author.display_name | Taichi Asami |
| authorships[5].author_position | middle |
| authorships[5].raw_author_name | Asami, Taichi |
| authorships[5].is_corresponding | False |
| authorships[6].author.id | https://openalex.org/A5023868166 |
| authorships[6].author.orcid | https://orcid.org/0000-0002-5175-7834 |
| authorships[6].author.display_name | Marc Delcroix |
| authorships[6].author_position | middle |
| authorships[6].raw_author_name | Delcroix, Marc |
| authorships[6].is_corresponding | False |
| authorships[7].author.id | https://openalex.org/A5112939036 |
| authorships[7].author.orcid | |
| authorships[7].author.display_name | Yukinori Honma |
| authorships[7].author_position | last |
| authorships[7].raw_author_name | Honma, Yukinori |
| authorships[7].is_corresponding | False |
| has_content.pdf | True |
| has_content.grobid_xml | True |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2306.08374 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2023-06-17T00:00:00 |
| display_name | SpeechGLUE: How Well Can Self-Supervised Speech Models Capture Linguistic Knowledge? |
| has_fulltext | True |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10201 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9973000288009644 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1702 |
| primary_topic.subfield.display_name | Artificial Intelligence |
| primary_topic.display_name | Speech Recognition and Synthesis |
| related_works | https://openalex.org/W2378211422, https://openalex.org/W2745001401, https://openalex.org/W4321353415, https://openalex.org/W2130974462, https://openalex.org/W2028665553, https://openalex.org/W2086519370, https://openalex.org/W972276598, https://openalex.org/W2087343574, https://openalex.org/W4246352526, https://openalex.org/W2121910908 |
| cited_by_count | 1 |
| counts_by_year[0].year | 2023 |
| counts_by_year[0].cited_by_count | 1 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2306.08374 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2306.08374 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2306.08374 |
| primary_location.id | pmh:oai:arXiv.org:2306.08374 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2306.08374 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2306.08374 |
| publication_date | 2023-06-14 |
| publication_year | 2023 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 77, 91, 131 |
| abstract_inverted_index.In | 55 |
| abstract_inverted_index.as | 15 |
| abstract_inverted_index.be | 30 |
| abstract_inverted_index.if | 62 |
| abstract_inverted_index.in | 10, 32 |
| abstract_inverted_index.of | 80, 93, 103, 106, 134 |
| abstract_inverted_index.to | 29, 46, 60, 118 |
| abstract_inverted_index.we | 58, 74 |
| abstract_inverted_index.For | 71 |
| abstract_inverted_index.SSL | 23, 41, 64, 108, 114, 120 |
| abstract_inverted_index.aim | 59 |
| abstract_inverted_index.and | 17 |
| abstract_inverted_index.but | 51 |
| abstract_inverted_index.can | 66, 99, 129 |
| abstract_inverted_index.for | 3 |
| abstract_inverted_index.has | 6 |
| abstract_inverted_index.not | 48 |
| abstract_inverted_index.the | 40, 44, 81, 101 |
| abstract_inverted_index.GLUE | 89 |
| abstract_inverted_index.More | 20 |
| abstract_inverted_index.also | 26, 52 |
| abstract_inverted_index.been | 7, 27 |
| abstract_inverted_index.from | 138 |
| abstract_inverted_index.have | 25, 43 |
| abstract_inverted_index.just | 139 |
| abstract_inverted_index.only | 49 |
| abstract_inverted_index.such | 14 |
| abstract_inverted_index.than | 124 |
| abstract_inverted_index.that | 39, 112, 127 |
| abstract_inverted_index.they | 128 |
| abstract_inverted_index.this | 56, 72 |
| abstract_inverted_index.well | 67 |
| abstract_inverted_index.(SSL) | 2 |
| abstract_inverted_index.Since | 88 |
| abstract_inverted_index.data. | 142 |
| abstract_inverted_index.learn | 47 |
| abstract_inverted_index.shown | 28 |
| abstract_inverted_index.(GLUE) | 86 |
| abstract_inverted_index.amount | 133 |
| abstract_inverted_index.better | 123 |
| abstract_inverted_index.degree | 102 |
| abstract_inverted_index.models | 24, 42 |
| abstract_inverted_index.paper, | 57 |
| abstract_inverted_index.speech | 4, 16, 22, 63, 78, 107, 113, 141 |
| abstract_inverted_index.spoken | 34 |
| abstract_inverted_index.tasks, | 13, 37, 97 |
| abstract_inverted_index.General | 82 |
| abstract_inverted_index.ability | 105 |
| abstract_inverted_index.acquire | 130 |
| abstract_inverted_index.applied | 9 |
| abstract_inverted_index.capture | 68 |
| abstract_inverted_index.certain | 132 |
| abstract_inverted_index.clarify | 61 |
| abstract_inverted_index.general | 135 |
| abstract_inverted_index.models, | 115, 121 |
| abstract_inverted_index.models. | 109 |
| abstract_inverted_index.natural | 94 |
| abstract_inverted_index.perform | 122 |
| abstract_inverted_index.speaker | 18 |
| abstract_inverted_index.variety | 92 |
| abstract_inverted_index.various | 11 |
| abstract_inverted_index.version | 79 |
| abstract_inverted_index.Language | 83 |
| abstract_inverted_index.acoustic | 50 |
| abstract_inverted_index.although | 116 |
| abstract_inverted_index.implying | 38 |
| abstract_inverted_index.inferior | 117 |
| abstract_inverted_index.language | 35, 95 |
| abstract_inverted_index.learning | 1 |
| abstract_inverted_index.purpose, | 73 |
| abstract_inverted_index.advancing | 33 |
| abstract_inverted_index.comprises | 90 |
| abstract_inverted_index.elucidate | 100 |
| abstract_inverted_index.introduce | 75 |
| abstract_inverted_index.knowledge | 137 |
| abstract_inverted_index.potential | 45 |
| abstract_inverted_index.recently, | 21 |
| abstract_inverted_index.unlabeled | 140 |
| abstract_inverted_index.Evaluation | 85 |
| abstract_inverted_index.SpeechGLUE | 98 |
| abstract_inverted_index.baselines, | 125 |
| abstract_inverted_index.benchmark. | 87 |
| abstract_inverted_index.beneficial | 31 |
| abstract_inverted_index.downstream | 12 |
| abstract_inverted_index.knowledge. | 70 |
| abstract_inverted_index.linguistic | 53, 69, 104, 136 |
| abstract_inverted_index.suggesting | 126 |
| abstract_inverted_index.techniques | 65 |
| abstract_inverted_index.text-based | 119 |
| abstract_inverted_index.Experiments | 110 |
| abstract_inverted_index.SpeechGLUE, | 76 |
| abstract_inverted_index.demonstrate | 111 |
| abstract_inverted_index.information. | 54 |
| abstract_inverted_index.recognition. | 19 |
| abstract_inverted_index.successfully | 8 |
| abstract_inverted_index.Understanding | 84 |
| abstract_inverted_index.understanding | 36, 96 |
| abstract_inverted_index.representation | 5 |
| abstract_inverted_index.Self-supervised | 0 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 8 |
| sustainable_development_goals[0].id | https://metadata.un.org/sdg/4 |
| sustainable_development_goals[0].score | 0.7900000214576721 |
| sustainable_development_goals[0].display_name | Quality Education |
| citation_normalized_percentile |