EmoSpeaker: One-shot Fine-grained Emotion-Controlled Talking Face Generation Article Swipe
YOU?
·
· 2024
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2402.01422
Implementing fine-grained emotion control is crucial for emotion generation tasks because it enhances the expressive capability of the generative model, allowing it to accurately and comprehensively capture and express various nuanced emotional states, thereby improving the emotional quality and personalization of generated content. Generating fine-grained facial animations that accurately portray emotional expressions using only a portrait and an audio recording presents a challenge. In order to address this challenge, we propose a visual attribute-guided audio decoupler. This enables the obtention of content vectors solely related to the audio content, enhancing the stability of subsequent lip movement coefficient predictions. To achieve more precise emotional expression, we introduce a fine-grained emotion coefficient prediction module. Additionally, we propose an emotion intensity control method using a fine-grained emotion matrix. Through these, effective control over emotional expression in the generated videos and finer classification of emotion intensity are accomplished. Subsequently, a series of 3DMM coefficient generation networks are designed to predict 3D coefficients, followed by the utilization of a rendering network to generate the final video. Our experimental results demonstrate that our proposed method, EmoSpeaker, outperforms existing emotional talking face generation methods in terms of expression variation and lip synchronization. Project page: https://peterfanfan.github.io/EmoSpeaker/
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2402.01422
- https://arxiv.org/pdf/2402.01422
- OA Status
- green
- Cited By
- 1
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4391556105
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4391556105Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2402.01422Digital Object Identifier
- Title
-
EmoSpeaker: One-shot Fine-grained Emotion-Controlled Talking Face GenerationWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2024Year of publication
- Publication date
-
2024-02-02Full publication date if available
- Authors
-
Guanwen Feng, Haoran Cheng, Yunan Li, Zhiyuan Ma, Chaoneng Li, Zhihao Qian, Qiguang Miao, Chi‐Man PunList of authors in order
- Landing page
-
https://arxiv.org/abs/2402.01422Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2402.01422Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2402.01422Direct OA link when available
- Concepts
-
Shot (pellet), Face (sociological concept), Psychology, Computer science, Cognitive psychology, Materials science, Linguistics, Philosophy, MetallurgyTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
1Total citation count in OpenAlex
- Citations by year (recent)
-
2025: 1Per-year citation counts (last 5 years)
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4391556105 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2402.01422 |
| ids.doi | https://doi.org/10.48550/arxiv.2402.01422 |
| ids.openalex | https://openalex.org/W4391556105 |
| fwci | |
| type | preprint |
| title | EmoSpeaker: One-shot Fine-grained Emotion-Controlled Talking Face Generation |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T11448 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9988999962806702 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1707 |
| topics[0].subfield.display_name | Computer Vision and Pattern Recognition |
| topics[0].display_name | Face recognition and analysis |
| topics[1].id | https://openalex.org/T10860 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9587000012397766 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1711 |
| topics[1].subfield.display_name | Signal Processing |
| topics[1].display_name | Speech and Audio Processing |
| topics[2].id | https://openalex.org/T10775 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.9165999889373779 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1707 |
| topics[2].subfield.display_name | Computer Vision and Pattern Recognition |
| topics[2].display_name | Generative Adversarial Networks and Image Synthesis |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C2778344882 |
| concepts[0].level | 2 |
| concepts[0].score | 0.7162990570068359 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q278938 |
| concepts[0].display_name | Shot (pellet) |
| concepts[1].id | https://openalex.org/C2779304628 |
| concepts[1].level | 2 |
| concepts[1].score | 0.616129994392395 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q3503480 |
| concepts[1].display_name | Face (sociological concept) |
| concepts[2].id | https://openalex.org/C15744967 |
| concepts[2].level | 0 |
| concepts[2].score | 0.4096476435661316 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q9418 |
| concepts[2].display_name | Psychology |
| concepts[3].id | https://openalex.org/C41008148 |
| concepts[3].level | 0 |
| concepts[3].score | 0.3405606150627136 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[3].display_name | Computer science |
| concepts[4].id | https://openalex.org/C180747234 |
| concepts[4].level | 1 |
| concepts[4].score | 0.32230597734451294 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q23373 |
| concepts[4].display_name | Cognitive psychology |
| concepts[5].id | https://openalex.org/C192562407 |
| concepts[5].level | 0 |
| concepts[5].score | 0.2011055052280426 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q228736 |
| concepts[5].display_name | Materials science |
| concepts[6].id | https://openalex.org/C41895202 |
| concepts[6].level | 1 |
| concepts[6].score | 0.11346381902694702 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q8162 |
| concepts[6].display_name | Linguistics |
| concepts[7].id | https://openalex.org/C138885662 |
| concepts[7].level | 0 |
| concepts[7].score | 0.08791115880012512 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q5891 |
| concepts[7].display_name | Philosophy |
| concepts[8].id | https://openalex.org/C191897082 |
| concepts[8].level | 1 |
| concepts[8].score | 0.055381208658218384 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q11467 |
| concepts[8].display_name | Metallurgy |
| keywords[0].id | https://openalex.org/keywords/shot |
| keywords[0].score | 0.7162990570068359 |
| keywords[0].display_name | Shot (pellet) |
| keywords[1].id | https://openalex.org/keywords/face |
| keywords[1].score | 0.616129994392395 |
| keywords[1].display_name | Face (sociological concept) |
| keywords[2].id | https://openalex.org/keywords/psychology |
| keywords[2].score | 0.4096476435661316 |
| keywords[2].display_name | Psychology |
| keywords[3].id | https://openalex.org/keywords/computer-science |
| keywords[3].score | 0.3405606150627136 |
| keywords[3].display_name | Computer science |
| keywords[4].id | https://openalex.org/keywords/cognitive-psychology |
| keywords[4].score | 0.32230597734451294 |
| keywords[4].display_name | Cognitive psychology |
| keywords[5].id | https://openalex.org/keywords/materials-science |
| keywords[5].score | 0.2011055052280426 |
| keywords[5].display_name | Materials science |
| keywords[6].id | https://openalex.org/keywords/linguistics |
| keywords[6].score | 0.11346381902694702 |
| keywords[6].display_name | Linguistics |
| keywords[7].id | https://openalex.org/keywords/philosophy |
| keywords[7].score | 0.08791115880012512 |
| keywords[7].display_name | Philosophy |
| keywords[8].id | https://openalex.org/keywords/metallurgy |
| keywords[8].score | 0.055381208658218384 |
| keywords[8].display_name | Metallurgy |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2402.01422 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2402.01422 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2402.01422 |
| locations[1].id | doi:10.48550/arxiv.2402.01422 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2402.01422 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5109552527 |
| authorships[0].author.orcid | |
| authorships[0].author.display_name | Guanwen Feng |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Feng, Guanwen |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5112601365 |
| authorships[1].author.orcid | |
| authorships[1].author.display_name | Haoran Cheng |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Cheng, Haoran |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5044422446 |
| authorships[2].author.orcid | https://orcid.org/0000-0001-7316-4354 |
| authorships[2].author.display_name | Yunan Li |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Li, Yunan |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5100771254 |
| authorships[3].author.orcid | https://orcid.org/0000-0001-5056-9519 |
| authorships[3].author.display_name | Zhiyuan Ma |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Ma, Zhiyuan |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5111124793 |
| authorships[4].author.orcid | |
| authorships[4].author.display_name | Chaoneng Li |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Li, Chaoneng |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A5023284797 |
| authorships[5].author.orcid | |
| authorships[5].author.display_name | Zhihao Qian |
| authorships[5].author_position | middle |
| authorships[5].raw_author_name | Qian, Zhihao |
| authorships[5].is_corresponding | False |
| authorships[6].author.id | https://openalex.org/A5007404362 |
| authorships[6].author.orcid | https://orcid.org/0000-0002-2872-388X |
| authorships[6].author.display_name | Qiguang Miao |
| authorships[6].author_position | middle |
| authorships[6].raw_author_name | Miao, Qiguang |
| authorships[6].is_corresponding | False |
| authorships[7].author.id | https://openalex.org/A5005772506 |
| authorships[7].author.orcid | https://orcid.org/0000-0003-1788-3746 |
| authorships[7].author.display_name | Chi‐Man Pun |
| authorships[7].author_position | last |
| authorships[7].raw_author_name | Pun, Chi-Man |
| authorships[7].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2402.01422 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2024-02-06T00:00:00 |
| display_name | EmoSpeaker: One-shot Fine-grained Emotion-Controlled Talking Face Generation |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T11448 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9988999962806702 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1707 |
| primary_topic.subfield.display_name | Computer Vision and Pattern Recognition |
| primary_topic.display_name | Face recognition and analysis |
| related_works | https://openalex.org/W2748952813, https://openalex.org/W4214877189, https://openalex.org/W2074502265, https://openalex.org/W2773965352, https://openalex.org/W2381179799, https://openalex.org/W2334685461, https://openalex.org/W2366718574, https://openalex.org/W2359774528, https://openalex.org/W4298312966, https://openalex.org/W2728912566 |
| cited_by_count | 1 |
| counts_by_year[0].year | 2025 |
| counts_by_year[0].cited_by_count | 1 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2402.01422 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2402.01422 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2402.01422 |
| primary_location.id | pmh:oai:arXiv.org:2402.01422 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2402.01422 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2402.01422 |
| publication_date | 2024-02-02 |
| publication_year | 2024 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 54, 61, 71, 106, 121, 145, 163 |
| abstract_inverted_index.3D | 156 |
| abstract_inverted_index.In | 63 |
| abstract_inverted_index.To | 98 |
| abstract_inverted_index.an | 57, 115 |
| abstract_inverted_index.by | 159 |
| abstract_inverted_index.in | 132, 187 |
| abstract_inverted_index.is | 4 |
| abstract_inverted_index.it | 11, 21 |
| abstract_inverted_index.of | 16, 40, 80, 92, 139, 147, 162, 189 |
| abstract_inverted_index.to | 22, 65, 85, 154, 166 |
| abstract_inverted_index.we | 69, 104, 113 |
| abstract_inverted_index.Our | 171 |
| abstract_inverted_index.and | 24, 27, 38, 56, 136, 192 |
| abstract_inverted_index.are | 142, 152 |
| abstract_inverted_index.for | 6 |
| abstract_inverted_index.lip | 94, 193 |
| abstract_inverted_index.our | 176 |
| abstract_inverted_index.the | 13, 17, 35, 78, 86, 90, 133, 160, 168 |
| abstract_inverted_index.3DMM | 148 |
| abstract_inverted_index.This | 76 |
| abstract_inverted_index.face | 184 |
| abstract_inverted_index.more | 100 |
| abstract_inverted_index.only | 53 |
| abstract_inverted_index.over | 129 |
| abstract_inverted_index.that | 47, 175 |
| abstract_inverted_index.this | 67 |
| abstract_inverted_index.audio | 58, 74, 87 |
| abstract_inverted_index.final | 169 |
| abstract_inverted_index.finer | 137 |
| abstract_inverted_index.order | 64 |
| abstract_inverted_index.page: | 196 |
| abstract_inverted_index.tasks | 9 |
| abstract_inverted_index.terms | 188 |
| abstract_inverted_index.using | 52, 120 |
| abstract_inverted_index.facial | 45 |
| abstract_inverted_index.method | 119 |
| abstract_inverted_index.model, | 19 |
| abstract_inverted_index.series | 146 |
| abstract_inverted_index.solely | 83 |
| abstract_inverted_index.these, | 126 |
| abstract_inverted_index.video. | 170 |
| abstract_inverted_index.videos | 135 |
| abstract_inverted_index.visual | 72 |
| abstract_inverted_index.Project | 195 |
| abstract_inverted_index.Through | 125 |
| abstract_inverted_index.achieve | 99 |
| abstract_inverted_index.address | 66 |
| abstract_inverted_index.because | 10 |
| abstract_inverted_index.capture | 26 |
| abstract_inverted_index.content | 81 |
| abstract_inverted_index.control | 3, 118, 128 |
| abstract_inverted_index.crucial | 5 |
| abstract_inverted_index.emotion | 2, 7, 108, 116, 123, 140 |
| abstract_inverted_index.enables | 77 |
| abstract_inverted_index.express | 28 |
| abstract_inverted_index.matrix. | 124 |
| abstract_inverted_index.method, | 178 |
| abstract_inverted_index.methods | 186 |
| abstract_inverted_index.module. | 111 |
| abstract_inverted_index.network | 165 |
| abstract_inverted_index.nuanced | 30 |
| abstract_inverted_index.portray | 49 |
| abstract_inverted_index.precise | 101 |
| abstract_inverted_index.predict | 155 |
| abstract_inverted_index.propose | 70, 114 |
| abstract_inverted_index.quality | 37 |
| abstract_inverted_index.related | 84 |
| abstract_inverted_index.results | 173 |
| abstract_inverted_index.states, | 32 |
| abstract_inverted_index.talking | 183 |
| abstract_inverted_index.thereby | 33 |
| abstract_inverted_index.various | 29 |
| abstract_inverted_index.vectors | 82 |
| abstract_inverted_index.allowing | 20 |
| abstract_inverted_index.content, | 88 |
| abstract_inverted_index.content. | 42 |
| abstract_inverted_index.designed | 153 |
| abstract_inverted_index.enhances | 12 |
| abstract_inverted_index.existing | 181 |
| abstract_inverted_index.followed | 158 |
| abstract_inverted_index.generate | 167 |
| abstract_inverted_index.movement | 95 |
| abstract_inverted_index.networks | 151 |
| abstract_inverted_index.portrait | 55 |
| abstract_inverted_index.presents | 60 |
| abstract_inverted_index.proposed | 177 |
| abstract_inverted_index.effective | 127 |
| abstract_inverted_index.emotional | 31, 36, 50, 102, 130, 182 |
| abstract_inverted_index.enhancing | 89 |
| abstract_inverted_index.generated | 41, 134 |
| abstract_inverted_index.improving | 34 |
| abstract_inverted_index.intensity | 117, 141 |
| abstract_inverted_index.introduce | 105 |
| abstract_inverted_index.obtention | 79 |
| abstract_inverted_index.recording | 59 |
| abstract_inverted_index.rendering | 164 |
| abstract_inverted_index.stability | 91 |
| abstract_inverted_index.variation | 191 |
| abstract_inverted_index.Generating | 43 |
| abstract_inverted_index.accurately | 23, 48 |
| abstract_inverted_index.animations | 46 |
| abstract_inverted_index.capability | 15 |
| abstract_inverted_index.challenge, | 68 |
| abstract_inverted_index.challenge. | 62 |
| abstract_inverted_index.decoupler. | 75 |
| abstract_inverted_index.expression | 131, 190 |
| abstract_inverted_index.expressive | 14 |
| abstract_inverted_index.generation | 8, 150, 185 |
| abstract_inverted_index.generative | 18 |
| abstract_inverted_index.prediction | 110 |
| abstract_inverted_index.subsequent | 93 |
| abstract_inverted_index.EmoSpeaker, | 179 |
| abstract_inverted_index.coefficient | 96, 109, 149 |
| abstract_inverted_index.demonstrate | 174 |
| abstract_inverted_index.expression, | 103 |
| abstract_inverted_index.expressions | 51 |
| abstract_inverted_index.outperforms | 180 |
| abstract_inverted_index.utilization | 161 |
| abstract_inverted_index.Implementing | 0 |
| abstract_inverted_index.experimental | 172 |
| abstract_inverted_index.fine-grained | 1, 44, 107, 122 |
| abstract_inverted_index.predictions. | 97 |
| abstract_inverted_index.Additionally, | 112 |
| abstract_inverted_index.Subsequently, | 144 |
| abstract_inverted_index.accomplished. | 143 |
| abstract_inverted_index.coefficients, | 157 |
| abstract_inverted_index.classification | 138 |
| abstract_inverted_index.comprehensively | 25 |
| abstract_inverted_index.personalization | 39 |
| abstract_inverted_index.attribute-guided | 73 |
| abstract_inverted_index.synchronization. | 194 |
| abstract_inverted_index.https://peterfanfan.github.io/EmoSpeaker/ | 197 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 8 |
| citation_normalized_percentile |