Accented sentence and word recognition: Humans versus whisper automatic speech recognition Article Swipe
YOU?
·
· 2024
· Open Access
·
· DOI: https://doi.org/10.1121/10.0035071
Despite advancements in speech recognition technology, questions remain about model generalizability and how much models mirror human perception. These questions are addressed by comparing OpenAI's Whisper model and 75 human transcribers on 300 English sentences (20 speakers, half F, half M, half US-accented, half [Mexican-]Spanish-accented). Sentences ended in 100 target words, with ⅓ high-predictability sentences (The farmer milked the cows) and ⅔ varying degrees of low-predictability (The farmer/barmer milked the nose). Target-word error rate (WER) was examined for final words in sentences and for isolated final words (recordings excised from same sentences). WER decreased with increasing model size, but was higher for Spanish-accented than US-accented speech, suggesting imperfect generalizability. Both models and humans benefited from more-predictable vs. less-predictable sentences. However, using isolated-word WERs as a baseline revealed that sentence context affected models and humans differently: humans benefited only from high-predictability sentences, while models benefited somewhat from any sentence context. Humans outperformed models on isolated words, suggesting that Whisper may have a restricted distribution of single-word utterances or may need lengthier acoustic context than humans. Findings suggest that more inclusive, varied training data may yield more generalizable ASR. Potential for using ASR to model human speech adaptability is discussed.
Related Topics
- Type
- article
- Language
- en
- Landing Page
- https://doi.org/10.1121/10.0035071
- OA Status
- gold
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4406369569
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4406369569Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.1121/10.0035071Digital Object Identifier
- Title
-
Accented sentence and word recognition: Humans versus whisper automatic speech recognitionWork title
- Type
-
articleOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2024Year of publication
- Publication date
-
2024-10-01Full publication date if available
- Authors
-
Junrong Chen, Julia Kwong, Sarah C. CreelList of authors in order
- Landing page
-
https://doi.org/10.1121/10.0035071Publisher landing page
- Open access
-
YesWhether a free full text is available
- OA status
-
goldOpen access status per OpenAlex
- OA URL
-
https://escholarship.org/uc/item/1jj98794Direct OA link when available
- Concepts
-
Speech recognition, Word (group theory), Sentence, Computer science, Word recognition, Natural language processing, Linguistics, Artificial intelligence, Reading (process), PhilosophyTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4406369569 |
|---|---|
| doi | https://doi.org/10.1121/10.0035071 |
| ids.doi | https://doi.org/10.1121/10.0035071 |
| ids.openalex | https://openalex.org/W4406369569 |
| fwci | 0.0 |
| type | article |
| title | Accented sentence and word recognition: Humans versus whisper automatic speech recognition |
| biblio.issue | 4_Supplement |
| biblio.volume | 156 |
| biblio.last_page | A50 |
| biblio.first_page | A50 |
| topics[0].id | https://openalex.org/T12262 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.7660999894142151 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1702 |
| topics[0].subfield.display_name | Artificial Intelligence |
| topics[0].display_name | Hate Speech and Cyberbullying Detection |
| topics[1].id | https://openalex.org/T12151 |
| topics[1].field.id | https://openalex.org/fields/36 |
| topics[1].field.display_name | Health Professions |
| topics[1].score | 0.6830000281333923 |
| topics[1].domain.id | https://openalex.org/domains/4 |
| topics[1].domain.display_name | Health Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/3600 |
| topics[1].subfield.display_name | General Health Professions |
| topics[1].display_name | Interpreting and Communication in Healthcare |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C28490314 |
| concepts[0].level | 1 |
| concepts[0].score | 0.6372861266136169 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q189436 |
| concepts[0].display_name | Speech recognition |
| concepts[1].id | https://openalex.org/C90805587 |
| concepts[1].level | 2 |
| concepts[1].score | 0.6368911862373352 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q10944557 |
| concepts[1].display_name | Word (group theory) |
| concepts[2].id | https://openalex.org/C2777530160 |
| concepts[2].level | 2 |
| concepts[2].score | 0.6235555410385132 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q41796 |
| concepts[2].display_name | Sentence |
| concepts[3].id | https://openalex.org/C41008148 |
| concepts[3].level | 0 |
| concepts[3].score | 0.5919587016105652 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[3].display_name | Computer science |
| concepts[4].id | https://openalex.org/C150856459 |
| concepts[4].level | 3 |
| concepts[4].score | 0.5372931957244873 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q8034367 |
| concepts[4].display_name | Word recognition |
| concepts[5].id | https://openalex.org/C204321447 |
| concepts[5].level | 1 |
| concepts[5].score | 0.4637359380722046 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q30642 |
| concepts[5].display_name | Natural language processing |
| concepts[6].id | https://openalex.org/C41895202 |
| concepts[6].level | 1 |
| concepts[6].score | 0.3791053891181946 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q8162 |
| concepts[6].display_name | Linguistics |
| concepts[7].id | https://openalex.org/C154945302 |
| concepts[7].level | 1 |
| concepts[7].score | 0.37537169456481934 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[7].display_name | Artificial intelligence |
| concepts[8].id | https://openalex.org/C554936623 |
| concepts[8].level | 2 |
| concepts[8].score | 0.05876368284225464 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q199657 |
| concepts[8].display_name | Reading (process) |
| concepts[9].id | https://openalex.org/C138885662 |
| concepts[9].level | 0 |
| concepts[9].score | 0.0 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q5891 |
| concepts[9].display_name | Philosophy |
| keywords[0].id | https://openalex.org/keywords/speech-recognition |
| keywords[0].score | 0.6372861266136169 |
| keywords[0].display_name | Speech recognition |
| keywords[1].id | https://openalex.org/keywords/word |
| keywords[1].score | 0.6368911862373352 |
| keywords[1].display_name | Word (group theory) |
| keywords[2].id | https://openalex.org/keywords/sentence |
| keywords[2].score | 0.6235555410385132 |
| keywords[2].display_name | Sentence |
| keywords[3].id | https://openalex.org/keywords/computer-science |
| keywords[3].score | 0.5919587016105652 |
| keywords[3].display_name | Computer science |
| keywords[4].id | https://openalex.org/keywords/word-recognition |
| keywords[4].score | 0.5372931957244873 |
| keywords[4].display_name | Word recognition |
| keywords[5].id | https://openalex.org/keywords/natural-language-processing |
| keywords[5].score | 0.4637359380722046 |
| keywords[5].display_name | Natural language processing |
| keywords[6].id | https://openalex.org/keywords/linguistics |
| keywords[6].score | 0.3791053891181946 |
| keywords[6].display_name | Linguistics |
| keywords[7].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[7].score | 0.37537169456481934 |
| keywords[7].display_name | Artificial intelligence |
| keywords[8].id | https://openalex.org/keywords/reading |
| keywords[8].score | 0.05876368284225464 |
| keywords[8].display_name | Reading (process) |
| language | en |
| locations[0].id | doi:10.1121/10.0035071 |
| locations[0].is_oa | False |
| locations[0].source.id | https://openalex.org/S11296630 |
| locations[0].source.issn | 0001-4966, 1520-8524, 1520-9024 |
| locations[0].source.type | journal |
| locations[0].source.is_oa | False |
| locations[0].source.issn_l | 0001-4966 |
| locations[0].source.is_core | True |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | The Journal of the Acoustical Society of America |
| locations[0].source.host_organization | https://openalex.org/P4310320226 |
| locations[0].source.host_organization_name | Acoustical Society of America |
| locations[0].source.host_organization_lineage | https://openalex.org/P4310320226 |
| locations[0].source.host_organization_lineage_names | Acoustical Society of America |
| locations[0].license | |
| locations[0].pdf_url | |
| locations[0].version | publishedVersion |
| locations[0].raw_type | journal-article |
| locations[0].license_id | |
| locations[0].is_accepted | True |
| locations[0].is_published | True |
| locations[0].raw_source_name | The Journal of the Acoustical Society of America |
| locations[0].landing_page_url | https://doi.org/10.1121/10.0035071 |
| locations[1].id | pmh:oai:escholarship.org:ark:/13030/qt1jj98794 |
| locations[1].is_oa | True |
| locations[1].source | |
| locations[1].license | cc-by |
| locations[1].pdf_url | |
| locations[1].version | submittedVersion |
| locations[1].raw_type | article |
| locations[1].license_id | https://openalex.org/licenses/cc-by |
| locations[1].is_accepted | False |
| locations[1].is_published | False |
| locations[1].raw_source_name | The Journal of the Acoustical Society of America, vol 156, iss 4_Supplement |
| locations[1].landing_page_url | https://escholarship.org/uc/item/1jj98794 |
| indexed_in | crossref |
| authorships[0].author.id | https://openalex.org/A5079484644 |
| authorships[0].author.orcid | https://orcid.org/0009-0007-4048-3667 |
| authorships[0].author.display_name | Junrong Chen |
| authorships[0].countries | US |
| authorships[0].affiliations[0].institution_ids | https://openalex.org/I36258959 |
| authorships[0].affiliations[0].raw_affiliation_string | Cognitive Science, Univ. of California, San Diego, 9450 Gilman Dr., La Jolla, CA 92092, [email protected] |
| authorships[0].institutions[0].id | https://openalex.org/I36258959 |
| authorships[0].institutions[0].ror | https://ror.org/0168r3w48 |
| authorships[0].institutions[0].type | education |
| authorships[0].institutions[0].lineage | https://openalex.org/I36258959 |
| authorships[0].institutions[0].country_code | US |
| authorships[0].institutions[0].display_name | University of California, San Diego |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Junrong Chen |
| authorships[0].is_corresponding | False |
| authorships[0].raw_affiliation_strings | Cognitive Science, Univ. of California, San Diego, 9450 Gilman Dr., La Jolla, CA 92092, [email protected] |
| authorships[1].author.id | https://openalex.org/A5111603042 |
| authorships[1].author.orcid | |
| authorships[1].author.display_name | Julia Kwong |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Jan Kwong |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5019356280 |
| authorships[2].author.orcid | https://orcid.org/0000-0002-1891-4228 |
| authorships[2].author.display_name | Sarah C. Creel |
| authorships[2].countries | US |
| authorships[2].affiliations[0].institution_ids | https://openalex.org/I36258959 |
| authorships[2].affiliations[0].raw_affiliation_string | Cognitive Science, Univ. of California, San Diego, La Jolla, CA |
| authorships[2].institutions[0].id | https://openalex.org/I36258959 |
| authorships[2].institutions[0].ror | https://ror.org/0168r3w48 |
| authorships[2].institutions[0].type | education |
| authorships[2].institutions[0].lineage | https://openalex.org/I36258959 |
| authorships[2].institutions[0].country_code | US |
| authorships[2].institutions[0].display_name | University of California, San Diego |
| authorships[2].author_position | last |
| authorships[2].raw_author_name | Sarah C. Creel |
| authorships[2].is_corresponding | False |
| authorships[2].raw_affiliation_strings | Cognitive Science, Univ. of California, San Diego, La Jolla, CA |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://escholarship.org/uc/item/1jj98794 |
| open_access.oa_status | gold |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Accented sentence and word recognition: Humans versus whisper automatic speech recognition |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T03:46:38.306776 |
| primary_topic.id | https://openalex.org/T12262 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.7660999894142151 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1702 |
| primary_topic.subfield.display_name | Artificial Intelligence |
| primary_topic.display_name | Hate Speech and Cyberbullying Detection |
| related_works | https://openalex.org/W2296205523, https://openalex.org/W2112800125, https://openalex.org/W178422364, https://openalex.org/W2069374145, https://openalex.org/W2401522294, https://openalex.org/W1962828410, https://openalex.org/W2358212252, https://openalex.org/W3197877226, https://openalex.org/W2382772676, https://openalex.org/W2158882055 |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:escholarship.org:ark:/13030/qt1jj98794 |
| best_oa_location.is_oa | True |
| best_oa_location.source | |
| best_oa_location.license | cc-by |
| best_oa_location.pdf_url | |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | article |
| best_oa_location.license_id | https://openalex.org/licenses/cc-by |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | The Journal of the Acoustical Society of America, vol 156, iss 4_Supplement |
| best_oa_location.landing_page_url | https://escholarship.org/uc/item/1jj98794 |
| primary_location.id | doi:10.1121/10.0035071 |
| primary_location.is_oa | False |
| primary_location.source.id | https://openalex.org/S11296630 |
| primary_location.source.issn | 0001-4966, 1520-8524, 1520-9024 |
| primary_location.source.type | journal |
| primary_location.source.is_oa | False |
| primary_location.source.issn_l | 0001-4966 |
| primary_location.source.is_core | True |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | The Journal of the Acoustical Society of America |
| primary_location.source.host_organization | https://openalex.org/P4310320226 |
| primary_location.source.host_organization_name | Acoustical Society of America |
| primary_location.source.host_organization_lineage | https://openalex.org/P4310320226 |
| primary_location.source.host_organization_lineage_names | Acoustical Society of America |
| primary_location.license | |
| primary_location.pdf_url | |
| primary_location.version | publishedVersion |
| primary_location.raw_type | journal-article |
| primary_location.license_id | |
| primary_location.is_accepted | True |
| primary_location.is_published | True |
| primary_location.raw_source_name | The Journal of the Acoustical Society of America |
| primary_location.landing_page_url | https://doi.org/10.1121/10.0035071 |
| publication_date | 2024-10-01 |
| publication_year | 2024 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 124, 160 |
| abstract_inverted_index.75 | 28 |
| abstract_inverted_index.F, | 38 |
| abstract_inverted_index.M, | 40 |
| abstract_inverted_index.as | 123 |
| abstract_inverted_index.by | 22 |
| abstract_inverted_index.in | 2, 47, 80 |
| abstract_inverted_index.is | 196 |
| abstract_inverted_index.of | 64, 163 |
| abstract_inverted_index.on | 31, 152 |
| abstract_inverted_index.or | 166 |
| abstract_inverted_index.to | 191 |
| abstract_inverted_index.(20 | 35 |
| abstract_inverted_index.100 | 48 |
| abstract_inverted_index.300 | 32 |
| abstract_inverted_index.ASR | 190 |
| abstract_inverted_index.WER | 92 |
| abstract_inverted_index.and | 11, 27, 60, 82, 111, 132 |
| abstract_inverted_index.any | 146 |
| abstract_inverted_index.are | 20 |
| abstract_inverted_index.but | 98 |
| abstract_inverted_index.for | 77, 83, 101, 188 |
| abstract_inverted_index.how | 12 |
| abstract_inverted_index.may | 158, 167, 182 |
| abstract_inverted_index.the | 58, 69 |
| abstract_inverted_index.vs. | 116 |
| abstract_inverted_index.was | 75, 99 |
| abstract_inverted_index.⅓ | 52 |
| abstract_inverted_index.⅔ | 61 |
| abstract_inverted_index.(The | 55, 66 |
| abstract_inverted_index.ASR. | 186 |
| abstract_inverted_index.Both | 109 |
| abstract_inverted_index.WERs | 122 |
| abstract_inverted_index.data | 181 |
| abstract_inverted_index.from | 89, 114, 138, 145 |
| abstract_inverted_index.half | 37, 39, 41, 43 |
| abstract_inverted_index.have | 159 |
| abstract_inverted_index.more | 177, 184 |
| abstract_inverted_index.much | 13 |
| abstract_inverted_index.need | 168 |
| abstract_inverted_index.only | 137 |
| abstract_inverted_index.rate | 73 |
| abstract_inverted_index.same | 90 |
| abstract_inverted_index.than | 103, 172 |
| abstract_inverted_index.that | 127, 156, 176 |
| abstract_inverted_index.with | 51, 94 |
| abstract_inverted_index.(WER) | 74 |
| abstract_inverted_index.These | 18 |
| abstract_inverted_index.about | 8 |
| abstract_inverted_index.cows) | 59 |
| abstract_inverted_index.ended | 46 |
| abstract_inverted_index.error | 72 |
| abstract_inverted_index.final | 78, 85 |
| abstract_inverted_index.human | 16, 29, 193 |
| abstract_inverted_index.model | 9, 26, 96, 192 |
| abstract_inverted_index.size, | 97 |
| abstract_inverted_index.using | 120, 189 |
| abstract_inverted_index.while | 141 |
| abstract_inverted_index.words | 79, 86 |
| abstract_inverted_index.yield | 183 |
| abstract_inverted_index.Humans | 149 |
| abstract_inverted_index.farmer | 56 |
| abstract_inverted_index.higher | 100 |
| abstract_inverted_index.humans | 112, 133, 135 |
| abstract_inverted_index.milked | 57, 68 |
| abstract_inverted_index.mirror | 15 |
| abstract_inverted_index.models | 14, 110, 131, 142, 151 |
| abstract_inverted_index.nose). | 70 |
| abstract_inverted_index.remain | 7 |
| abstract_inverted_index.speech | 3, 194 |
| abstract_inverted_index.target | 49 |
| abstract_inverted_index.varied | 179 |
| abstract_inverted_index.words, | 50, 154 |
| abstract_inverted_index.Despite | 0 |
| abstract_inverted_index.English | 33 |
| abstract_inverted_index.Whisper | 25, 157 |
| abstract_inverted_index.context | 129, 171 |
| abstract_inverted_index.degrees | 63 |
| abstract_inverted_index.excised | 88 |
| abstract_inverted_index.humans. | 173 |
| abstract_inverted_index.speech, | 105 |
| abstract_inverted_index.suggest | 175 |
| abstract_inverted_index.varying | 62 |
| abstract_inverted_index.Findings | 174 |
| abstract_inverted_index.However, | 119 |
| abstract_inverted_index.OpenAI's | 24 |
| abstract_inverted_index.acoustic | 170 |
| abstract_inverted_index.affected | 130 |
| abstract_inverted_index.baseline | 125 |
| abstract_inverted_index.context. | 148 |
| abstract_inverted_index.examined | 76 |
| abstract_inverted_index.isolated | 84, 153 |
| abstract_inverted_index.revealed | 126 |
| abstract_inverted_index.sentence | 128, 147 |
| abstract_inverted_index.somewhat | 144 |
| abstract_inverted_index.training | 180 |
| abstract_inverted_index.Potential | 187 |
| abstract_inverted_index.Sentences | 45 |
| abstract_inverted_index.addressed | 21 |
| abstract_inverted_index.benefited | 113, 136, 143 |
| abstract_inverted_index.comparing | 23 |
| abstract_inverted_index.decreased | 93 |
| abstract_inverted_index.imperfect | 107 |
| abstract_inverted_index.lengthier | 169 |
| abstract_inverted_index.questions | 6, 19 |
| abstract_inverted_index.sentences | 34, 54, 81 |
| abstract_inverted_index.speakers, | 36 |
| abstract_inverted_index.discussed. | 197 |
| abstract_inverted_index.inclusive, | 178 |
| abstract_inverted_index.increasing | 95 |
| abstract_inverted_index.restricted | 161 |
| abstract_inverted_index.sentences, | 140 |
| abstract_inverted_index.sentences. | 118 |
| abstract_inverted_index.suggesting | 106, 155 |
| abstract_inverted_index.utterances | 165 |
| abstract_inverted_index.(recordings | 87 |
| abstract_inverted_index.Target-word | 71 |
| abstract_inverted_index.US-accented | 104 |
| abstract_inverted_index.perception. | 17 |
| abstract_inverted_index.recognition | 4 |
| abstract_inverted_index.sentences). | 91 |
| abstract_inverted_index.single-word | 164 |
| abstract_inverted_index.technology, | 5 |
| abstract_inverted_index.US-accented, | 42 |
| abstract_inverted_index.adaptability | 195 |
| abstract_inverted_index.advancements | 1 |
| abstract_inverted_index.differently: | 134 |
| abstract_inverted_index.distribution | 162 |
| abstract_inverted_index.outperformed | 150 |
| abstract_inverted_index.transcribers | 30 |
| abstract_inverted_index.farmer/barmer | 67 |
| abstract_inverted_index.generalizable | 185 |
| abstract_inverted_index.isolated-word | 121 |
| abstract_inverted_index.Spanish-accented | 102 |
| abstract_inverted_index.generalizability | 10 |
| abstract_inverted_index.less-predictable | 117 |
| abstract_inverted_index.more-predictable | 115 |
| abstract_inverted_index.generalizability. | 108 |
| abstract_inverted_index.low-predictability | 65 |
| abstract_inverted_index.high-predictability | 53, 139 |
| abstract_inverted_index.[Mexican-]Spanish-accented). | 44 |
| cited_by_percentile_year | |
| countries_distinct_count | 1 |
| institutions_distinct_count | 3 |
| citation_normalized_percentile.value | 0.28649709 |
| citation_normalized_percentile.is_in_top_1_percent | False |
| citation_normalized_percentile.is_in_top_10_percent | False |