Learning Local to Global Feature Aggregation for Speech Emotion Recognition Article Swipe
YOU?
·
· 2023
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2306.01491
Transformer has emerged in speech emotion recognition (SER) at present. However, its equal patch division not only damages frequency information but also ignores local emotion correlations across frames, which are key cues to represent emotion. To handle the issue, we propose a Local to Global Feature Aggregation learning (LGFA) for SER, which can aggregate longterm emotion correlations at different scales both inside frames and segments with entire frequency information to enhance the emotion discrimination of utterance-level speech features. For this purpose, we nest a Frame Transformer inside a Segment Transformer. Firstly, Frame Transformer is designed to excavate local emotion correlations between frames for frame embeddings. Then, the frame embeddings and their corresponding segment features are aggregated as different-level complements to be fed into Segment Transformer for learning utterance-level global emotion features. Experimental results show that the performance of LGFA is superior to the state-of-the-art methods.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2306.01491
- https://arxiv.org/pdf/2306.01491
- OA Status
- green
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4379474365
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4379474365Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2306.01491Digital Object Identifier
- Title
-
Learning Local to Global Feature Aggregation for Speech Emotion RecognitionWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2023Year of publication
- Publication date
-
2023-06-02Full publication date if available
- Authors
-
Cheng Lu, Hailun Lian, Wenming Zheng, Yuan Zong, Yan Zhao, Sunan LiList of authors in order
- Landing page
-
https://arxiv.org/abs/2306.01491Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2306.01491Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2306.01491Direct OA link when available
- Concepts
-
Transformer, Utterance, Computer science, Speech recognition, Frame (networking), Artificial intelligence, Natural language processing, Engineering, Telecommunications, Voltage, Electrical engineeringTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4379474365 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2306.01491 |
| ids.doi | https://doi.org/10.48550/arxiv.2306.01491 |
| ids.openalex | https://openalex.org/W4379474365 |
| fwci | |
| type | preprint |
| title | Learning Local to Global Feature Aggregation for Speech Emotion Recognition |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10667 |
| topics[0].field.id | https://openalex.org/fields/32 |
| topics[0].field.display_name | Psychology |
| topics[0].score | 0.9883999824523926 |
| topics[0].domain.id | https://openalex.org/domains/2 |
| topics[0].domain.display_name | Social Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/3205 |
| topics[0].subfield.display_name | Experimental and Cognitive Psychology |
| topics[0].display_name | Emotion and Mood Recognition |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C66322947 |
| concepts[0].level | 3 |
| concepts[0].score | 0.7243636846542358 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q11658 |
| concepts[0].display_name | Transformer |
| concepts[1].id | https://openalex.org/C2775852435 |
| concepts[1].level | 2 |
| concepts[1].score | 0.7149534821510315 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q258403 |
| concepts[1].display_name | Utterance |
| concepts[2].id | https://openalex.org/C41008148 |
| concepts[2].level | 0 |
| concepts[2].score | 0.6141499280929565 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[2].display_name | Computer science |
| concepts[3].id | https://openalex.org/C28490314 |
| concepts[3].level | 1 |
| concepts[3].score | 0.5306070446968079 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q189436 |
| concepts[3].display_name | Speech recognition |
| concepts[4].id | https://openalex.org/C126042441 |
| concepts[4].level | 2 |
| concepts[4].score | 0.4149740934371948 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q1324888 |
| concepts[4].display_name | Frame (networking) |
| concepts[5].id | https://openalex.org/C154945302 |
| concepts[5].level | 1 |
| concepts[5].score | 0.41148796677589417 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[5].display_name | Artificial intelligence |
| concepts[6].id | https://openalex.org/C204321447 |
| concepts[6].level | 1 |
| concepts[6].score | 0.3324238359928131 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q30642 |
| concepts[6].display_name | Natural language processing |
| concepts[7].id | https://openalex.org/C127413603 |
| concepts[7].level | 0 |
| concepts[7].score | 0.15600994229316711 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q11023 |
| concepts[7].display_name | Engineering |
| concepts[8].id | https://openalex.org/C76155785 |
| concepts[8].level | 1 |
| concepts[8].score | 0.06436783075332642 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q418 |
| concepts[8].display_name | Telecommunications |
| concepts[9].id | https://openalex.org/C165801399 |
| concepts[9].level | 2 |
| concepts[9].score | 0.0 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q25428 |
| concepts[9].display_name | Voltage |
| concepts[10].id | https://openalex.org/C119599485 |
| concepts[10].level | 1 |
| concepts[10].score | 0.0 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q43035 |
| concepts[10].display_name | Electrical engineering |
| keywords[0].id | https://openalex.org/keywords/transformer |
| keywords[0].score | 0.7243636846542358 |
| keywords[0].display_name | Transformer |
| keywords[1].id | https://openalex.org/keywords/utterance |
| keywords[1].score | 0.7149534821510315 |
| keywords[1].display_name | Utterance |
| keywords[2].id | https://openalex.org/keywords/computer-science |
| keywords[2].score | 0.6141499280929565 |
| keywords[2].display_name | Computer science |
| keywords[3].id | https://openalex.org/keywords/speech-recognition |
| keywords[3].score | 0.5306070446968079 |
| keywords[3].display_name | Speech recognition |
| keywords[4].id | https://openalex.org/keywords/frame |
| keywords[4].score | 0.4149740934371948 |
| keywords[4].display_name | Frame (networking) |
| keywords[5].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[5].score | 0.41148796677589417 |
| keywords[5].display_name | Artificial intelligence |
| keywords[6].id | https://openalex.org/keywords/natural-language-processing |
| keywords[6].score | 0.3324238359928131 |
| keywords[6].display_name | Natural language processing |
| keywords[7].id | https://openalex.org/keywords/engineering |
| keywords[7].score | 0.15600994229316711 |
| keywords[7].display_name | Engineering |
| keywords[8].id | https://openalex.org/keywords/telecommunications |
| keywords[8].score | 0.06436783075332642 |
| keywords[8].display_name | Telecommunications |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2306.01491 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | cc-by |
| locations[0].pdf_url | https://arxiv.org/pdf/2306.01491 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | https://openalex.org/licenses/cc-by |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2306.01491 |
| locations[1].id | doi:10.48550/arxiv.2306.01491 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | cc-by |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | https://openalex.org/licenses/cc-by |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2306.01491 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5112404880 |
| authorships[0].author.orcid | https://orcid.org/0000-0001-5905-2321 |
| authorships[0].author.display_name | Cheng Lu |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Lu, Cheng |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5091060125 |
| authorships[1].author.orcid | https://orcid.org/0000-0002-1355-9503 |
| authorships[1].author.display_name | Hailun Lian |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Lian, Hailun |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5029771864 |
| authorships[2].author.orcid | https://orcid.org/0000-0002-7764-5179 |
| authorships[2].author.display_name | Wenming Zheng |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Zheng, Wenming |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5101064081 |
| authorships[3].author.orcid | |
| authorships[3].author.display_name | Yuan Zong |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Zong, Yuan |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5100727732 |
| authorships[4].author.orcid | https://orcid.org/0000-0003-4577-7078 |
| authorships[4].author.display_name | Yan Zhao |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Zhao, Yan |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A5114859532 |
| authorships[5].author.orcid | https://orcid.org/0000-0003-1494-4873 |
| authorships[5].author.display_name | Sunan Li |
| authorships[5].author_position | last |
| authorships[5].raw_author_name | Li, Sunan |
| authorships[5].is_corresponding | False |
| has_content.pdf | True |
| has_content.grobid_xml | True |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2306.01491 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2023-06-07T00:00:00 |
| display_name | Learning Local to Global Feature Aggregation for Speech Emotion Recognition |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10667 |
| primary_topic.field.id | https://openalex.org/fields/32 |
| primary_topic.field.display_name | Psychology |
| primary_topic.score | 0.9883999824523926 |
| primary_topic.domain.id | https://openalex.org/domains/2 |
| primary_topic.domain.display_name | Social Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/3205 |
| primary_topic.subfield.display_name | Experimental and Cognitive Psychology |
| primary_topic.display_name | Emotion and Mood Recognition |
| related_works | https://openalex.org/W2529301793, https://openalex.org/W2384121599, https://openalex.org/W2038083449, https://openalex.org/W3177678247, https://openalex.org/W2333799855, https://openalex.org/W1999617572, https://openalex.org/W2944572343, https://openalex.org/W2351687372, https://openalex.org/W3016124757, https://openalex.org/W3034520363 |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2306.01491 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | cc-by |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2306.01491 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | https://openalex.org/licenses/cc-by |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2306.01491 |
| primary_location.id | pmh:oai:arXiv.org:2306.01491 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | cc-by |
| primary_location.pdf_url | https://arxiv.org/pdf/2306.01491 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | https://openalex.org/licenses/cc-by |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2306.01491 |
| publication_date | 2023-06-02 |
| publication_year | 2023 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 41, 83, 87 |
| abstract_inverted_index.To | 35 |
| abstract_inverted_index.as | 116 |
| abstract_inverted_index.at | 8, 57 |
| abstract_inverted_index.be | 120 |
| abstract_inverted_index.in | 3 |
| abstract_inverted_index.is | 93, 139 |
| abstract_inverted_index.of | 74, 137 |
| abstract_inverted_index.to | 32, 43, 69, 95, 119, 141 |
| abstract_inverted_index.we | 39, 81 |
| abstract_inverted_index.For | 78 |
| abstract_inverted_index.and | 63, 109 |
| abstract_inverted_index.are | 29, 114 |
| abstract_inverted_index.but | 20 |
| abstract_inverted_index.can | 52 |
| abstract_inverted_index.fed | 121 |
| abstract_inverted_index.for | 49, 102, 125 |
| abstract_inverted_index.has | 1 |
| abstract_inverted_index.its | 11 |
| abstract_inverted_index.key | 30 |
| abstract_inverted_index.not | 15 |
| abstract_inverted_index.the | 37, 71, 106, 135, 142 |
| abstract_inverted_index.LGFA | 138 |
| abstract_inverted_index.SER, | 50 |
| abstract_inverted_index.also | 21 |
| abstract_inverted_index.both | 60 |
| abstract_inverted_index.cues | 31 |
| abstract_inverted_index.into | 122 |
| abstract_inverted_index.nest | 82 |
| abstract_inverted_index.only | 16 |
| abstract_inverted_index.show | 133 |
| abstract_inverted_index.that | 134 |
| abstract_inverted_index.this | 79 |
| abstract_inverted_index.with | 65 |
| abstract_inverted_index.(SER) | 7 |
| abstract_inverted_index.Frame | 84, 91 |
| abstract_inverted_index.Local | 42 |
| abstract_inverted_index.Then, | 105 |
| abstract_inverted_index.equal | 12 |
| abstract_inverted_index.frame | 103, 107 |
| abstract_inverted_index.local | 23, 97 |
| abstract_inverted_index.patch | 13 |
| abstract_inverted_index.their | 110 |
| abstract_inverted_index.which | 28, 51 |
| abstract_inverted_index.(LGFA) | 48 |
| abstract_inverted_index.Global | 44 |
| abstract_inverted_index.across | 26 |
| abstract_inverted_index.entire | 66 |
| abstract_inverted_index.frames | 62, 101 |
| abstract_inverted_index.global | 128 |
| abstract_inverted_index.handle | 36 |
| abstract_inverted_index.inside | 61, 86 |
| abstract_inverted_index.issue, | 38 |
| abstract_inverted_index.scales | 59 |
| abstract_inverted_index.speech | 4, 76 |
| abstract_inverted_index.Feature | 45 |
| abstract_inverted_index.Segment | 88, 123 |
| abstract_inverted_index.between | 100 |
| abstract_inverted_index.damages | 17 |
| abstract_inverted_index.emerged | 2 |
| abstract_inverted_index.emotion | 5, 24, 55, 72, 98, 129 |
| abstract_inverted_index.enhance | 70 |
| abstract_inverted_index.frames, | 27 |
| abstract_inverted_index.ignores | 22 |
| abstract_inverted_index.propose | 40 |
| abstract_inverted_index.results | 132 |
| abstract_inverted_index.segment | 112 |
| abstract_inverted_index.Firstly, | 90 |
| abstract_inverted_index.However, | 10 |
| abstract_inverted_index.designed | 94 |
| abstract_inverted_index.division | 14 |
| abstract_inverted_index.emotion. | 34 |
| abstract_inverted_index.excavate | 96 |
| abstract_inverted_index.features | 113 |
| abstract_inverted_index.learning | 47, 126 |
| abstract_inverted_index.longterm | 54 |
| abstract_inverted_index.methods. | 144 |
| abstract_inverted_index.present. | 9 |
| abstract_inverted_index.purpose, | 80 |
| abstract_inverted_index.segments | 64 |
| abstract_inverted_index.superior | 140 |
| abstract_inverted_index.aggregate | 53 |
| abstract_inverted_index.different | 58 |
| abstract_inverted_index.features. | 77, 130 |
| abstract_inverted_index.frequency | 18, 67 |
| abstract_inverted_index.represent | 33 |
| abstract_inverted_index.aggregated | 115 |
| abstract_inverted_index.embeddings | 108 |
| abstract_inverted_index.Aggregation | 46 |
| abstract_inverted_index.Transformer | 0, 85, 92, 124 |
| abstract_inverted_index.complements | 118 |
| abstract_inverted_index.embeddings. | 104 |
| abstract_inverted_index.information | 19, 68 |
| abstract_inverted_index.performance | 136 |
| abstract_inverted_index.recognition | 6 |
| abstract_inverted_index.Experimental | 131 |
| abstract_inverted_index.Transformer. | 89 |
| abstract_inverted_index.correlations | 25, 56, 99 |
| abstract_inverted_index.corresponding | 111 |
| abstract_inverted_index.discrimination | 73 |
| abstract_inverted_index.different-level | 117 |
| abstract_inverted_index.utterance-level | 75, 127 |
| abstract_inverted_index.state-of-the-art | 143 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 6 |
| sustainable_development_goals[0].id | https://metadata.un.org/sdg/10 |
| sustainable_development_goals[0].score | 0.5299999713897705 |
| sustainable_development_goals[0].display_name | Reduced inequalities |
| sustainable_development_goals[1].id | https://metadata.un.org/sdg/16 |
| sustainable_development_goals[1].score | 0.44999998807907104 |
| sustainable_development_goals[1].display_name | Peace, Justice and strong institutions |
| citation_normalized_percentile |