Multi-modal Speech Emotion Recognition via Feature Distribution Adaptation Network Article Swipe
YOU?
·
· 2024
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2410.22023
In this paper, we propose a novel deep inductive transfer learning framework, named feature distribution adaptation network, to tackle the challenging multi-modal speech emotion recognition problem. Our method aims to use deep transfer learning strategies to align visual and audio feature distributions to obtain consistent representation of emotion, thereby improving the performance of speech emotion recognition. In our model, the pre-trained ResNet-34 is utilized for feature extraction for facial expression images and acoustic Mel spectrograms, respectively. Then, the cross-attention mechanism is introduced to model the intrinsic similarity relationships of multi-modal features. Finally, the multi-modal feature distribution adaptation is performed efficiently with feed-forward network, which is extended using the local maximum mean discrepancy loss. Experiments are carried out on two benchmark datasets, and the results demonstrate that our model can achieve excellent performance compared with existing ones.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2410.22023
- https://arxiv.org/pdf/2410.22023
- OA Status
- green
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4404350088
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4404350088Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2410.22023Digital Object Identifier
- Title
-
Multi-modal Speech Emotion Recognition via Feature Distribution Adaptation NetworkWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2024Year of publication
- Publication date
-
2024-10-29Full publication date if available
- Authors
-
Shaokai Li, Yixuan Ji, Peng Song, Haoqin Sun, Wenming ZhengList of authors in order
- Landing page
-
https://arxiv.org/abs/2410.22023Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2410.22023Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2410.22023Direct OA link when available
- Concepts
-
Modal, Feature (linguistics), Adaptation (eye), Speech recognition, Computer science, Emotion recognition, Natural language processing, Psychology, Linguistics, Neuroscience, Materials science, Polymer chemistry, PhilosophyTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4404350088 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2410.22023 |
| ids.doi | https://doi.org/10.48550/arxiv.2410.22023 |
| ids.openalex | https://openalex.org/W4404350088 |
| fwci | |
| type | preprint |
| title | Multi-modal Speech Emotion Recognition via Feature Distribution Adaptation Network |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10667 |
| topics[0].field.id | https://openalex.org/fields/32 |
| topics[0].field.display_name | Psychology |
| topics[0].score | 0.8935999870300293 |
| topics[0].domain.id | https://openalex.org/domains/2 |
| topics[0].domain.display_name | Social Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/3205 |
| topics[0].subfield.display_name | Experimental and Cognitive Psychology |
| topics[0].display_name | Emotion and Mood Recognition |
| topics[1].id | https://openalex.org/T10860 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.8458999991416931 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1711 |
| topics[1].subfield.display_name | Signal Processing |
| topics[1].display_name | Speech and Audio Processing |
| topics[2].id | https://openalex.org/T10201 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.7409999966621399 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1702 |
| topics[2].subfield.display_name | Artificial Intelligence |
| topics[2].display_name | Speech Recognition and Synthesis |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C71139939 |
| concepts[0].level | 2 |
| concepts[0].score | 0.6936081647872925 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q910194 |
| concepts[0].display_name | Modal |
| concepts[1].id | https://openalex.org/C2776401178 |
| concepts[1].level | 2 |
| concepts[1].score | 0.6861782670021057 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q12050496 |
| concepts[1].display_name | Feature (linguistics) |
| concepts[2].id | https://openalex.org/C139807058 |
| concepts[2].level | 2 |
| concepts[2].score | 0.6520419120788574 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q352374 |
| concepts[2].display_name | Adaptation (eye) |
| concepts[3].id | https://openalex.org/C28490314 |
| concepts[3].level | 1 |
| concepts[3].score | 0.6410506963729858 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q189436 |
| concepts[3].display_name | Speech recognition |
| concepts[4].id | https://openalex.org/C41008148 |
| concepts[4].level | 0 |
| concepts[4].score | 0.5676463842391968 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[4].display_name | Computer science |
| concepts[5].id | https://openalex.org/C2777438025 |
| concepts[5].level | 2 |
| concepts[5].score | 0.5570142865180969 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q1339090 |
| concepts[5].display_name | Emotion recognition |
| concepts[6].id | https://openalex.org/C204321447 |
| concepts[6].level | 1 |
| concepts[6].score | 0.3535500764846802 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q30642 |
| concepts[6].display_name | Natural language processing |
| concepts[7].id | https://openalex.org/C15744967 |
| concepts[7].level | 0 |
| concepts[7].score | 0.280222088098526 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q9418 |
| concepts[7].display_name | Psychology |
| concepts[8].id | https://openalex.org/C41895202 |
| concepts[8].level | 1 |
| concepts[8].score | 0.2039804458618164 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q8162 |
| concepts[8].display_name | Linguistics |
| concepts[9].id | https://openalex.org/C169760540 |
| concepts[9].level | 1 |
| concepts[9].score | 0.048895061016082764 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q207011 |
| concepts[9].display_name | Neuroscience |
| concepts[10].id | https://openalex.org/C192562407 |
| concepts[10].level | 0 |
| concepts[10].score | 0.04522326588630676 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q228736 |
| concepts[10].display_name | Materials science |
| concepts[11].id | https://openalex.org/C188027245 |
| concepts[11].level | 1 |
| concepts[11].score | 0.0 |
| concepts[11].wikidata | https://www.wikidata.org/wiki/Q750446 |
| concepts[11].display_name | Polymer chemistry |
| concepts[12].id | https://openalex.org/C138885662 |
| concepts[12].level | 0 |
| concepts[12].score | 0.0 |
| concepts[12].wikidata | https://www.wikidata.org/wiki/Q5891 |
| concepts[12].display_name | Philosophy |
| keywords[0].id | https://openalex.org/keywords/modal |
| keywords[0].score | 0.6936081647872925 |
| keywords[0].display_name | Modal |
| keywords[1].id | https://openalex.org/keywords/feature |
| keywords[1].score | 0.6861782670021057 |
| keywords[1].display_name | Feature (linguistics) |
| keywords[2].id | https://openalex.org/keywords/adaptation |
| keywords[2].score | 0.6520419120788574 |
| keywords[2].display_name | Adaptation (eye) |
| keywords[3].id | https://openalex.org/keywords/speech-recognition |
| keywords[3].score | 0.6410506963729858 |
| keywords[3].display_name | Speech recognition |
| keywords[4].id | https://openalex.org/keywords/computer-science |
| keywords[4].score | 0.5676463842391968 |
| keywords[4].display_name | Computer science |
| keywords[5].id | https://openalex.org/keywords/emotion-recognition |
| keywords[5].score | 0.5570142865180969 |
| keywords[5].display_name | Emotion recognition |
| keywords[6].id | https://openalex.org/keywords/natural-language-processing |
| keywords[6].score | 0.3535500764846802 |
| keywords[6].display_name | Natural language processing |
| keywords[7].id | https://openalex.org/keywords/psychology |
| keywords[7].score | 0.280222088098526 |
| keywords[7].display_name | Psychology |
| keywords[8].id | https://openalex.org/keywords/linguistics |
| keywords[8].score | 0.2039804458618164 |
| keywords[8].display_name | Linguistics |
| keywords[9].id | https://openalex.org/keywords/neuroscience |
| keywords[9].score | 0.048895061016082764 |
| keywords[9].display_name | Neuroscience |
| keywords[10].id | https://openalex.org/keywords/materials-science |
| keywords[10].score | 0.04522326588630676 |
| keywords[10].display_name | Materials science |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2410.22023 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2410.22023 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2410.22023 |
| locations[1].id | doi:10.48550/arxiv.2410.22023 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2410.22023 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5000061781 |
| authorships[0].author.orcid | https://orcid.org/0000-0002-1684-043X |
| authorships[0].author.display_name | Shaokai Li |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Li, Shaokai |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5057258581 |
| authorships[1].author.orcid | |
| authorships[1].author.display_name | Yixuan Ji |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Ji, Yixuan |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5009279384 |
| authorships[2].author.orcid | https://orcid.org/0000-0002-6567-663X |
| authorships[2].author.display_name | Peng Song |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Song, Peng |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5035524175 |
| authorships[3].author.orcid | https://orcid.org/0000-0002-8554-8969 |
| authorships[3].author.display_name | Haoqin Sun |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Sun, Haoqin |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5029771864 |
| authorships[4].author.orcid | https://orcid.org/0000-0002-7764-5179 |
| authorships[4].author.display_name | Wenming Zheng |
| authorships[4].author_position | last |
| authorships[4].raw_author_name | Zheng, Wenming |
| authorships[4].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2410.22023 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Multi-modal Speech Emotion Recognition via Feature Distribution Adaptation Network |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10667 |
| primary_topic.field.id | https://openalex.org/fields/32 |
| primary_topic.field.display_name | Psychology |
| primary_topic.score | 0.8935999870300293 |
| primary_topic.domain.id | https://openalex.org/domains/2 |
| primary_topic.domain.display_name | Social Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/3205 |
| primary_topic.subfield.display_name | Experimental and Cognitive Psychology |
| primary_topic.display_name | Emotion and Mood Recognition |
| related_works | https://openalex.org/W2997567050, https://openalex.org/W1483272040, https://openalex.org/W4283377908, https://openalex.org/W1526712007, https://openalex.org/W1533421371, https://openalex.org/W2003050223, https://openalex.org/W3105646692, https://openalex.org/W4387914125, https://openalex.org/W3126677997, https://openalex.org/W1610857240 |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2410.22023 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2410.22023 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2410.22023 |
| primary_location.id | pmh:oai:arXiv.org:2410.22023 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2410.22023 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2410.22023 |
| publication_date | 2024-10-29 |
| publication_year | 2024 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 5 |
| abstract_inverted_index.In | 0, 56 |
| abstract_inverted_index.is | 62, 80, 97, 104 |
| abstract_inverted_index.of | 46, 52, 88 |
| abstract_inverted_index.on | 117 |
| abstract_inverted_index.to | 17, 29, 35, 42, 82 |
| abstract_inverted_index.we | 3 |
| abstract_inverted_index.Mel | 73 |
| abstract_inverted_index.Our | 26 |
| abstract_inverted_index.and | 38, 71, 121 |
| abstract_inverted_index.are | 114 |
| abstract_inverted_index.can | 128 |
| abstract_inverted_index.for | 64, 67 |
| abstract_inverted_index.our | 57, 126 |
| abstract_inverted_index.out | 116 |
| abstract_inverted_index.the | 19, 50, 59, 77, 84, 92, 107, 122 |
| abstract_inverted_index.two | 118 |
| abstract_inverted_index.use | 30 |
| abstract_inverted_index.aims | 28 |
| abstract_inverted_index.deep | 7, 31 |
| abstract_inverted_index.mean | 110 |
| abstract_inverted_index.that | 125 |
| abstract_inverted_index.this | 1 |
| abstract_inverted_index.with | 100, 133 |
| abstract_inverted_index.Then, | 76 |
| abstract_inverted_index.align | 36 |
| abstract_inverted_index.audio | 39 |
| abstract_inverted_index.local | 108 |
| abstract_inverted_index.loss. | 112 |
| abstract_inverted_index.model | 83, 127 |
| abstract_inverted_index.named | 12 |
| abstract_inverted_index.novel | 6 |
| abstract_inverted_index.ones. | 135 |
| abstract_inverted_index.using | 106 |
| abstract_inverted_index.which | 103 |
| abstract_inverted_index.facial | 68 |
| abstract_inverted_index.images | 70 |
| abstract_inverted_index.method | 27 |
| abstract_inverted_index.model, | 58 |
| abstract_inverted_index.obtain | 43 |
| abstract_inverted_index.paper, | 2 |
| abstract_inverted_index.speech | 22, 53 |
| abstract_inverted_index.tackle | 18 |
| abstract_inverted_index.visual | 37 |
| abstract_inverted_index.achieve | 129 |
| abstract_inverted_index.carried | 115 |
| abstract_inverted_index.emotion | 23, 54 |
| abstract_inverted_index.feature | 13, 40, 65, 94 |
| abstract_inverted_index.maximum | 109 |
| abstract_inverted_index.propose | 4 |
| abstract_inverted_index.results | 123 |
| abstract_inverted_index.thereby | 48 |
| abstract_inverted_index.Finally, | 91 |
| abstract_inverted_index.acoustic | 72 |
| abstract_inverted_index.compared | 132 |
| abstract_inverted_index.emotion, | 47 |
| abstract_inverted_index.existing | 134 |
| abstract_inverted_index.extended | 105 |
| abstract_inverted_index.learning | 10, 33 |
| abstract_inverted_index.network, | 16, 102 |
| abstract_inverted_index.problem. | 25 |
| abstract_inverted_index.transfer | 9, 32 |
| abstract_inverted_index.utilized | 63 |
| abstract_inverted_index.ResNet-34 | 61 |
| abstract_inverted_index.benchmark | 119 |
| abstract_inverted_index.datasets, | 120 |
| abstract_inverted_index.excellent | 130 |
| abstract_inverted_index.features. | 90 |
| abstract_inverted_index.improving | 49 |
| abstract_inverted_index.inductive | 8 |
| abstract_inverted_index.intrinsic | 85 |
| abstract_inverted_index.mechanism | 79 |
| abstract_inverted_index.performed | 98 |
| abstract_inverted_index.adaptation | 15, 96 |
| abstract_inverted_index.consistent | 44 |
| abstract_inverted_index.expression | 69 |
| abstract_inverted_index.extraction | 66 |
| abstract_inverted_index.framework, | 11 |
| abstract_inverted_index.introduced | 81 |
| abstract_inverted_index.similarity | 86 |
| abstract_inverted_index.strategies | 34 |
| abstract_inverted_index.Experiments | 113 |
| abstract_inverted_index.challenging | 20 |
| abstract_inverted_index.demonstrate | 124 |
| abstract_inverted_index.discrepancy | 111 |
| abstract_inverted_index.efficiently | 99 |
| abstract_inverted_index.multi-modal | 21, 89, 93 |
| abstract_inverted_index.performance | 51, 131 |
| abstract_inverted_index.pre-trained | 60 |
| abstract_inverted_index.recognition | 24 |
| abstract_inverted_index.distribution | 14, 95 |
| abstract_inverted_index.feed-forward | 101 |
| abstract_inverted_index.recognition. | 55 |
| abstract_inverted_index.distributions | 41 |
| abstract_inverted_index.relationships | 87 |
| abstract_inverted_index.respectively. | 75 |
| abstract_inverted_index.spectrograms, | 74 |
| abstract_inverted_index.representation | 45 |
| abstract_inverted_index.cross-attention | 78 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 5 |
| citation_normalized_percentile |