CLTalk: Speech-Driven 3D Facial Animation with Contrastive Learning Article Swipe
YOU?
·
· 2024
· Open Access
·
· DOI: https://doi.org/10.1145/3652583.3657625
Speech-driven 3D facial animation aims to generate realistic and vivid 3D facial animations from speech. However, the scarcity of labeled data and the tendency of existing methods to treat this cross-modal mapping problem as a regression task can result in inadequate learning of discriminative features from the speech. This deficiency often leads to excessively smooth facial movements, particularly in lip movements. To address these issues and enhance the accuracy of lip generation while reducing reliance on labeled data, we propose CLTalk, a framework based on a contrastive learning strategy. This framework comprises three main parts: a temporal domain contrastive learning strategy that facilitates the learning of discriminative features from different audio frames, a correlation learning method that ensures consistency between the distribution of audio features and Mesh labels, and a mouth opening angle constraint method to further improve the accuracy of lip generation. Extensive experimental results on the challenging, widely evaluated datasets indicate the effectiveness of our method compared with the state of the arts.
Related Topics
- Type
- article
- Language
- en
- Landing Page
- https://doi.org/10.1145/3652583.3657625
- https://dl.acm.org/doi/pdf/10.1145/3652583.3657625
- OA Status
- gold
- References
- 14
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4399418417
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4399418417Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.1145/3652583.3657625Digital Object Identifier
- Title
-
CLTalk: Speech-Driven 3D Facial Animation with Contrastive LearningWork title
- Type
-
articleOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2024Year of publication
- Publication date
-
2024-05-30Full publication date if available
- Authors
-
Xitie Zhang, Suping WuList of authors in order
- Landing page
-
https://doi.org/10.1145/3652583.3657625Publisher landing page
- PDF URL
-
https://dl.acm.org/doi/pdf/10.1145/3652583.3657625Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
goldOpen access status per OpenAlex
- OA URL
-
https://dl.acm.org/doi/pdf/10.1145/3652583.3657625Direct OA link when available
- Concepts
-
Computer science, Discriminative model, Artificial intelligence, Animation, Speech recognition, Computer facial animation, Pattern recognition (psychology), Computer animation, Computer graphics (images)Top concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- References (count)
-
14Number of works referenced by this work
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4399418417 |
|---|---|
| doi | https://doi.org/10.1145/3652583.3657625 |
| ids.doi | https://doi.org/10.1145/3652583.3657625 |
| ids.openalex | https://openalex.org/W4399418417 |
| fwci | 0.0 |
| type | article |
| title | CLTalk: Speech-Driven 3D Facial Animation with Contrastive Learning |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | 1179 |
| biblio.first_page | 1175 |
| topics[0].id | https://openalex.org/T11448 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9991000294685364 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1707 |
| topics[0].subfield.display_name | Computer Vision and Pattern Recognition |
| topics[0].display_name | Face recognition and analysis |
| topics[1].id | https://openalex.org/T10860 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9988999962806702 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1711 |
| topics[1].subfield.display_name | Signal Processing |
| topics[1].display_name | Speech and Audio Processing |
| topics[2].id | https://openalex.org/T12301 |
| topics[2].field.id | https://openalex.org/fields/27 |
| topics[2].field.display_name | Medicine |
| topics[2].score | 0.9735000133514404 |
| topics[2].domain.id | https://openalex.org/domains/4 |
| topics[2].domain.display_name | Health Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/2728 |
| topics[2].subfield.display_name | Neurology |
| topics[2].display_name | Facial Nerve Paralysis Treatment and Research |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C41008148 |
| concepts[0].level | 0 |
| concepts[0].score | 0.8134422898292542 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[0].display_name | Computer science |
| concepts[1].id | https://openalex.org/C97931131 |
| concepts[1].level | 2 |
| concepts[1].score | 0.7865310907363892 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q5282087 |
| concepts[1].display_name | Discriminative model |
| concepts[2].id | https://openalex.org/C154945302 |
| concepts[2].level | 1 |
| concepts[2].score | 0.5786063075065613 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[2].display_name | Artificial intelligence |
| concepts[3].id | https://openalex.org/C502989409 |
| concepts[3].level | 2 |
| concepts[3].score | 0.5326076745986938 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q11425 |
| concepts[3].display_name | Animation |
| concepts[4].id | https://openalex.org/C28490314 |
| concepts[4].level | 1 |
| concepts[4].score | 0.5164411067962646 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q189436 |
| concepts[4].display_name | Speech recognition |
| concepts[5].id | https://openalex.org/C138591656 |
| concepts[5].level | 4 |
| concepts[5].score | 0.4519840478897095 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q5157538 |
| concepts[5].display_name | Computer facial animation |
| concepts[6].id | https://openalex.org/C153180895 |
| concepts[6].level | 2 |
| concepts[6].score | 0.32585465908050537 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q7148389 |
| concepts[6].display_name | Pattern recognition (psychology) |
| concepts[7].id | https://openalex.org/C69369342 |
| concepts[7].level | 3 |
| concepts[7].score | 0.24872982501983643 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q1401416 |
| concepts[7].display_name | Computer animation |
| concepts[8].id | https://openalex.org/C121684516 |
| concepts[8].level | 1 |
| concepts[8].score | 0.0 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q7600677 |
| concepts[8].display_name | Computer graphics (images) |
| keywords[0].id | https://openalex.org/keywords/computer-science |
| keywords[0].score | 0.8134422898292542 |
| keywords[0].display_name | Computer science |
| keywords[1].id | https://openalex.org/keywords/discriminative-model |
| keywords[1].score | 0.7865310907363892 |
| keywords[1].display_name | Discriminative model |
| keywords[2].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[2].score | 0.5786063075065613 |
| keywords[2].display_name | Artificial intelligence |
| keywords[3].id | https://openalex.org/keywords/animation |
| keywords[3].score | 0.5326076745986938 |
| keywords[3].display_name | Animation |
| keywords[4].id | https://openalex.org/keywords/speech-recognition |
| keywords[4].score | 0.5164411067962646 |
| keywords[4].display_name | Speech recognition |
| keywords[5].id | https://openalex.org/keywords/computer-facial-animation |
| keywords[5].score | 0.4519840478897095 |
| keywords[5].display_name | Computer facial animation |
| keywords[6].id | https://openalex.org/keywords/pattern-recognition |
| keywords[6].score | 0.32585465908050537 |
| keywords[6].display_name | Pattern recognition (psychology) |
| keywords[7].id | https://openalex.org/keywords/computer-animation |
| keywords[7].score | 0.24872982501983643 |
| keywords[7].display_name | Computer animation |
| language | en |
| locations[0].id | doi:10.1145/3652583.3657625 |
| locations[0].is_oa | True |
| locations[0].source | |
| locations[0].license | |
| locations[0].pdf_url | https://dl.acm.org/doi/pdf/10.1145/3652583.3657625 |
| locations[0].version | publishedVersion |
| locations[0].raw_type | proceedings-article |
| locations[0].license_id | |
| locations[0].is_accepted | True |
| locations[0].is_published | True |
| locations[0].raw_source_name | Proceedings of the 2024 International Conference on Multimedia Retrieval |
| locations[0].landing_page_url | https://doi.org/10.1145/3652583.3657625 |
| indexed_in | crossref |
| authorships[0].author.id | https://openalex.org/A5019097648 |
| authorships[0].author.orcid | https://orcid.org/0000-0002-8114-9119 |
| authorships[0].author.display_name | Xitie Zhang |
| authorships[0].countries | CN |
| authorships[0].affiliations[0].institution_ids | https://openalex.org/I21642278 |
| authorships[0].affiliations[0].raw_affiliation_string | School of Information Engineering, Ningxia University, Yinchuan, Ningxia, China |
| authorships[0].institutions[0].id | https://openalex.org/I21642278 |
| authorships[0].institutions[0].ror | https://ror.org/04j7b2v61 |
| authorships[0].institutions[0].type | education |
| authorships[0].institutions[0].lineage | https://openalex.org/I21642278 |
| authorships[0].institutions[0].country_code | CN |
| authorships[0].institutions[0].display_name | Ningxia University |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Xitie Zhang |
| authorships[0].is_corresponding | False |
| authorships[0].raw_affiliation_strings | School of Information Engineering, Ningxia University, Yinchuan, Ningxia, China |
| authorships[1].author.id | https://openalex.org/A5101865021 |
| authorships[1].author.orcid | https://orcid.org/0000-0001-5207-1802 |
| authorships[1].author.display_name | Suping Wu |
| authorships[1].countries | CN |
| authorships[1].affiliations[0].institution_ids | https://openalex.org/I21642278 |
| authorships[1].affiliations[0].raw_affiliation_string | School of Information Engineering, Ningxia University, Yinchuan, Ningxia, China |
| authorships[1].institutions[0].id | https://openalex.org/I21642278 |
| authorships[1].institutions[0].ror | https://ror.org/04j7b2v61 |
| authorships[1].institutions[0].type | education |
| authorships[1].institutions[0].lineage | https://openalex.org/I21642278 |
| authorships[1].institutions[0].country_code | CN |
| authorships[1].institutions[0].display_name | Ningxia University |
| authorships[1].author_position | last |
| authorships[1].raw_author_name | Suping Wu |
| authorships[1].is_corresponding | False |
| authorships[1].raw_affiliation_strings | School of Information Engineering, Ningxia University, Yinchuan, Ningxia, China |
| has_content.pdf | True |
| has_content.grobid_xml | True |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://dl.acm.org/doi/pdf/10.1145/3652583.3657625 |
| open_access.oa_status | gold |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | CLTalk: Speech-Driven 3D Facial Animation with Contrastive Learning |
| has_fulltext | True |
| is_retracted | False |
| updated_date | 2025-11-06T03:46:38.306776 |
| primary_topic.id | https://openalex.org/T11448 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9991000294685364 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1707 |
| primary_topic.subfield.display_name | Computer Vision and Pattern Recognition |
| primary_topic.display_name | Face recognition and analysis |
| related_works | https://openalex.org/W1544039745, https://openalex.org/W2404514746, https://openalex.org/W1652783584, https://openalex.org/W2532377291, https://openalex.org/W2082783427, https://openalex.org/W2121378366, https://openalex.org/W2999276620, https://openalex.org/W2617644139, https://openalex.org/W2535923857, https://openalex.org/W4400097232 |
| cited_by_count | 0 |
| locations_count | 1 |
| best_oa_location.id | doi:10.1145/3652583.3657625 |
| best_oa_location.is_oa | True |
| best_oa_location.source | |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://dl.acm.org/doi/pdf/10.1145/3652583.3657625 |
| best_oa_location.version | publishedVersion |
| best_oa_location.raw_type | proceedings-article |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | True |
| best_oa_location.is_published | True |
| best_oa_location.raw_source_name | Proceedings of the 2024 International Conference on Multimedia Retrieval |
| best_oa_location.landing_page_url | https://doi.org/10.1145/3652583.3657625 |
| primary_location.id | doi:10.1145/3652583.3657625 |
| primary_location.is_oa | True |
| primary_location.source | |
| primary_location.license | |
| primary_location.pdf_url | https://dl.acm.org/doi/pdf/10.1145/3652583.3657625 |
| primary_location.version | publishedVersion |
| primary_location.raw_type | proceedings-article |
| primary_location.license_id | |
| primary_location.is_accepted | True |
| primary_location.is_published | True |
| primary_location.raw_source_name | Proceedings of the 2024 International Conference on Multimedia Retrieval |
| primary_location.landing_page_url | https://doi.org/10.1145/3652583.3657625 |
| publication_date | 2024-05-30 |
| publication_year | 2024 |
| referenced_works | https://openalex.org/W2981263323, https://openalex.org/W4200630629, https://openalex.org/W2154961933, https://openalex.org/W3035524453, https://openalex.org/W2739192055, https://openalex.org/W3175779516, https://openalex.org/W2769666294, https://openalex.org/W4387967971, https://openalex.org/W4390872742, https://openalex.org/W2745771616, https://openalex.org/W3154411171, https://openalex.org/W2737658251, https://openalex.org/W4386076250, https://openalex.org/W1990883837 |
| referenced_works_count | 14 |
| abstract_inverted_index.a | 34, 81, 85, 95, 112, 129 |
| abstract_inverted_index.3D | 1, 10 |
| abstract_inverted_index.To | 61 |
| abstract_inverted_index.as | 33 |
| abstract_inverted_index.in | 39, 58 |
| abstract_inverted_index.of | 18, 24, 42, 69, 105, 122, 140, 155, 162 |
| abstract_inverted_index.on | 75, 84, 146 |
| abstract_inverted_index.to | 5, 27, 52, 135 |
| abstract_inverted_index.we | 78 |
| abstract_inverted_index.and | 8, 21, 65, 125, 128 |
| abstract_inverted_index.can | 37 |
| abstract_inverted_index.lip | 59, 70, 141 |
| abstract_inverted_index.our | 156 |
| abstract_inverted_index.the | 16, 22, 46, 67, 103, 120, 138, 147, 153, 160, 163 |
| abstract_inverted_index.Mesh | 126 |
| abstract_inverted_index.This | 48, 89 |
| abstract_inverted_index.aims | 4 |
| abstract_inverted_index.data | 20 |
| abstract_inverted_index.from | 13, 45, 108 |
| abstract_inverted_index.main | 93 |
| abstract_inverted_index.task | 36 |
| abstract_inverted_index.that | 101, 116 |
| abstract_inverted_index.this | 29 |
| abstract_inverted_index.with | 159 |
| abstract_inverted_index.angle | 132 |
| abstract_inverted_index.arts. | 164 |
| abstract_inverted_index.audio | 110, 123 |
| abstract_inverted_index.based | 83 |
| abstract_inverted_index.data, | 77 |
| abstract_inverted_index.leads | 51 |
| abstract_inverted_index.mouth | 130 |
| abstract_inverted_index.often | 50 |
| abstract_inverted_index.state | 161 |
| abstract_inverted_index.these | 63 |
| abstract_inverted_index.three | 92 |
| abstract_inverted_index.treat | 28 |
| abstract_inverted_index.vivid | 9 |
| abstract_inverted_index.while | 72 |
| abstract_inverted_index.domain | 97 |
| abstract_inverted_index.facial | 2, 11, 55 |
| abstract_inverted_index.issues | 64 |
| abstract_inverted_index.method | 115, 134, 157 |
| abstract_inverted_index.parts: | 94 |
| abstract_inverted_index.result | 38 |
| abstract_inverted_index.smooth | 54 |
| abstract_inverted_index.widely | 149 |
| abstract_inverted_index.CLTalk, | 80 |
| abstract_inverted_index.address | 62 |
| abstract_inverted_index.between | 119 |
| abstract_inverted_index.enhance | 66 |
| abstract_inverted_index.ensures | 117 |
| abstract_inverted_index.frames, | 111 |
| abstract_inverted_index.further | 136 |
| abstract_inverted_index.improve | 137 |
| abstract_inverted_index.labeled | 19, 76 |
| abstract_inverted_index.labels, | 127 |
| abstract_inverted_index.mapping | 31 |
| abstract_inverted_index.methods | 26 |
| abstract_inverted_index.opening | 131 |
| abstract_inverted_index.problem | 32 |
| abstract_inverted_index.propose | 79 |
| abstract_inverted_index.results | 145 |
| abstract_inverted_index.speech. | 14, 47 |
| abstract_inverted_index.However, | 15 |
| abstract_inverted_index.accuracy | 68, 139 |
| abstract_inverted_index.compared | 158 |
| abstract_inverted_index.datasets | 151 |
| abstract_inverted_index.existing | 25 |
| abstract_inverted_index.features | 44, 107, 124 |
| abstract_inverted_index.generate | 6 |
| abstract_inverted_index.indicate | 152 |
| abstract_inverted_index.learning | 41, 87, 99, 104, 114 |
| abstract_inverted_index.reducing | 73 |
| abstract_inverted_index.reliance | 74 |
| abstract_inverted_index.scarcity | 17 |
| abstract_inverted_index.strategy | 100 |
| abstract_inverted_index.temporal | 96 |
| abstract_inverted_index.tendency | 23 |
| abstract_inverted_index.Extensive | 143 |
| abstract_inverted_index.animation | 3 |
| abstract_inverted_index.comprises | 91 |
| abstract_inverted_index.different | 109 |
| abstract_inverted_index.evaluated | 150 |
| abstract_inverted_index.framework | 82, 90 |
| abstract_inverted_index.realistic | 7 |
| abstract_inverted_index.strategy. | 88 |
| abstract_inverted_index.animations | 12 |
| abstract_inverted_index.constraint | 133 |
| abstract_inverted_index.deficiency | 49 |
| abstract_inverted_index.generation | 71 |
| abstract_inverted_index.inadequate | 40 |
| abstract_inverted_index.movements, | 56 |
| abstract_inverted_index.movements. | 60 |
| abstract_inverted_index.regression | 35 |
| abstract_inverted_index.consistency | 118 |
| abstract_inverted_index.contrastive | 86, 98 |
| abstract_inverted_index.correlation | 113 |
| abstract_inverted_index.cross-modal | 30 |
| abstract_inverted_index.excessively | 53 |
| abstract_inverted_index.facilitates | 102 |
| abstract_inverted_index.generation. | 142 |
| abstract_inverted_index.challenging, | 148 |
| abstract_inverted_index.distribution | 121 |
| abstract_inverted_index.experimental | 144 |
| abstract_inverted_index.particularly | 57 |
| abstract_inverted_index.Speech-driven | 0 |
| abstract_inverted_index.effectiveness | 154 |
| abstract_inverted_index.discriminative | 43, 106 |
| cited_by_percentile_year | |
| countries_distinct_count | 1 |
| institutions_distinct_count | 2 |
| sustainable_development_goals[0].id | https://metadata.un.org/sdg/4 |
| sustainable_development_goals[0].score | 0.47999998927116394 |
| sustainable_development_goals[0].display_name | Quality Education |
| citation_normalized_percentile.value | 0.0944868 |
| citation_normalized_percentile.is_in_top_1_percent | False |
| citation_normalized_percentile.is_in_top_10_percent | False |