C2G2: Controllable Co-speech Gesture Generation with Latent Diffusion Model Article Swipe
YOU?
·
· 2023
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2308.15016
Co-speech gesture generation is crucial for automatic digital avatar animation. However, existing methods suffer from issues such as unstable training and temporal inconsistency, particularly in generating high-fidelity and comprehensive gestures. Additionally, these methods lack effective control over speaker identity and temporal editing of the generated gestures. Focusing on capturing temporal latent information and applying practical controlling, we propose a Controllable Co-speech Gesture Generation framework, named C2G2. Specifically, we propose a two-stage temporal dependency enhancement strategy motivated by latent diffusion models. We further introduce two key features to C2G2, namely a speaker-specific decoder to generate speaker-related real-length skeletons and a repainting strategy for flexible gesture generation/editing. Extensive experiments on benchmark gesture datasets verify the effectiveness of our proposed C2G2 compared with several state-of-the-art baselines. The link of the project demo page can be found at https://c2g2-gesture.github.io/c2_gesture
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2308.15016
- https://arxiv.org/pdf/2308.15016
- OA Status
- green
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4386302287
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4386302287Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2308.15016Digital Object Identifier
- Title
-
C2G2: Controllable Co-speech Gesture Generation with Latent Diffusion ModelWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2023Year of publication
- Publication date
-
2023-08-29Full publication date if available
- Authors
-
Longbin Ji, Pengfei Wei, Yi Ren, Jinglin Liu, Chen Zhang, Xiang YinList of authors in order
- Landing page
-
https://arxiv.org/abs/2308.15016Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2308.15016Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2308.15016Direct OA link when available
- Concepts
-
Gesture, Computer science, Speech recognition, Animation, Benchmark (surveying), Gesture recognition, Artificial intelligence, Computer graphics (images), Geodesy, GeographyTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4386302287 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2308.15016 |
| ids.doi | https://doi.org/10.48550/arxiv.2308.15016 |
| ids.openalex | https://openalex.org/W4386302287 |
| fwci | |
| type | preprint |
| title | C2G2: Controllable Co-speech Gesture Generation with Latent Diffusion Model |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T12290 |
| topics[0].field.id | https://openalex.org/fields/22 |
| topics[0].field.display_name | Engineering |
| topics[0].score | 0.9994000196456909 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/2207 |
| topics[0].subfield.display_name | Control and Systems Engineering |
| topics[0].display_name | Human Motion and Animation |
| topics[1].id | https://openalex.org/T10812 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9975000023841858 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1707 |
| topics[1].subfield.display_name | Computer Vision and Pattern Recognition |
| topics[1].display_name | Human Pose and Action Recognition |
| topics[2].id | https://openalex.org/T11714 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.9909999966621399 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1707 |
| topics[2].subfield.display_name | Computer Vision and Pattern Recognition |
| topics[2].display_name | Multimodal Machine Learning Applications |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C207347870 |
| concepts[0].level | 2 |
| concepts[0].score | 0.9451359510421753 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q371174 |
| concepts[0].display_name | Gesture |
| concepts[1].id | https://openalex.org/C41008148 |
| concepts[1].level | 0 |
| concepts[1].score | 0.822617769241333 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[1].display_name | Computer science |
| concepts[2].id | https://openalex.org/C28490314 |
| concepts[2].level | 1 |
| concepts[2].score | 0.5837609767913818 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q189436 |
| concepts[2].display_name | Speech recognition |
| concepts[3].id | https://openalex.org/C502989409 |
| concepts[3].level | 2 |
| concepts[3].score | 0.5163809061050415 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q11425 |
| concepts[3].display_name | Animation |
| concepts[4].id | https://openalex.org/C185798385 |
| concepts[4].level | 2 |
| concepts[4].score | 0.5147714614868164 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q1161707 |
| concepts[4].display_name | Benchmark (surveying) |
| concepts[5].id | https://openalex.org/C159437735 |
| concepts[5].level | 3 |
| concepts[5].score | 0.43946120142936707 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q1519524 |
| concepts[5].display_name | Gesture recognition |
| concepts[6].id | https://openalex.org/C154945302 |
| concepts[6].level | 1 |
| concepts[6].score | 0.404893696308136 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[6].display_name | Artificial intelligence |
| concepts[7].id | https://openalex.org/C121684516 |
| concepts[7].level | 1 |
| concepts[7].score | 0.0991608202457428 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q7600677 |
| concepts[7].display_name | Computer graphics (images) |
| concepts[8].id | https://openalex.org/C13280743 |
| concepts[8].level | 1 |
| concepts[8].score | 0.0 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q131089 |
| concepts[8].display_name | Geodesy |
| concepts[9].id | https://openalex.org/C205649164 |
| concepts[9].level | 0 |
| concepts[9].score | 0.0 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q1071 |
| concepts[9].display_name | Geography |
| keywords[0].id | https://openalex.org/keywords/gesture |
| keywords[0].score | 0.9451359510421753 |
| keywords[0].display_name | Gesture |
| keywords[1].id | https://openalex.org/keywords/computer-science |
| keywords[1].score | 0.822617769241333 |
| keywords[1].display_name | Computer science |
| keywords[2].id | https://openalex.org/keywords/speech-recognition |
| keywords[2].score | 0.5837609767913818 |
| keywords[2].display_name | Speech recognition |
| keywords[3].id | https://openalex.org/keywords/animation |
| keywords[3].score | 0.5163809061050415 |
| keywords[3].display_name | Animation |
| keywords[4].id | https://openalex.org/keywords/benchmark |
| keywords[4].score | 0.5147714614868164 |
| keywords[4].display_name | Benchmark (surveying) |
| keywords[5].id | https://openalex.org/keywords/gesture-recognition |
| keywords[5].score | 0.43946120142936707 |
| keywords[5].display_name | Gesture recognition |
| keywords[6].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[6].score | 0.404893696308136 |
| keywords[6].display_name | Artificial intelligence |
| keywords[7].id | https://openalex.org/keywords/computer-graphics |
| keywords[7].score | 0.0991608202457428 |
| keywords[7].display_name | Computer graphics (images) |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2308.15016 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2308.15016 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2308.15016 |
| locations[1].id | doi:10.48550/arxiv.2308.15016 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2308.15016 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5102550751 |
| authorships[0].author.orcid | |
| authorships[0].author.display_name | Longbin Ji |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Ji, Longbin |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5015951797 |
| authorships[1].author.orcid | https://orcid.org/0000-0002-1155-6085 |
| authorships[1].author.display_name | Pengfei Wei |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Wei, Pengfei |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5075486968 |
| authorships[2].author.orcid | https://orcid.org/0000-0002-3665-700X |
| authorships[2].author.display_name | Yi Ren |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Ren, Yi |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5065126806 |
| authorships[3].author.orcid | https://orcid.org/0000-0002-9905-3887 |
| authorships[3].author.display_name | Jinglin Liu |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Liu, Jinglin |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5100374132 |
| authorships[4].author.orcid | https://orcid.org/0000-0002-6570-5947 |
| authorships[4].author.display_name | Chen Zhang |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Zhang, Chen |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A5100446849 |
| authorships[5].author.orcid | https://orcid.org/0000-0002-6096-9943 |
| authorships[5].author.display_name | Xiang Yin |
| authorships[5].author_position | last |
| authorships[5].raw_author_name | Yin, Xiang |
| authorships[5].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2308.15016 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2023-08-31T00:00:00 |
| display_name | C2G2: Controllable Co-speech Gesture Generation with Latent Diffusion Model |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T12290 |
| primary_topic.field.id | https://openalex.org/fields/22 |
| primary_topic.field.display_name | Engineering |
| primary_topic.score | 0.9994000196456909 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/2207 |
| primary_topic.subfield.display_name | Control and Systems Engineering |
| primary_topic.display_name | Human Motion and Animation |
| related_works | https://openalex.org/W2902873204, https://openalex.org/W2185750513, https://openalex.org/W2010878661, https://openalex.org/W3147379364, https://openalex.org/W2026258298, https://openalex.org/W3204639664, https://openalex.org/W2970836791, https://openalex.org/W2805039731, https://openalex.org/W2989699735, https://openalex.org/W4322710567 |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2308.15016 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2308.15016 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2308.15016 |
| primary_location.id | pmh:oai:arXiv.org:2308.15016 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2308.15016 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2308.15016 |
| publication_date | 2023-08-29 |
| publication_year | 2023 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 58, 69, 89, 98 |
| abstract_inverted_index.We | 80 |
| abstract_inverted_index.as | 17 |
| abstract_inverted_index.at | 133 |
| abstract_inverted_index.be | 131 |
| abstract_inverted_index.by | 76 |
| abstract_inverted_index.in | 24 |
| abstract_inverted_index.is | 3 |
| abstract_inverted_index.of | 42, 114, 125 |
| abstract_inverted_index.on | 47, 107 |
| abstract_inverted_index.to | 86, 92 |
| abstract_inverted_index.we | 56, 67 |
| abstract_inverted_index.The | 123 |
| abstract_inverted_index.and | 20, 27, 39, 52, 97 |
| abstract_inverted_index.can | 130 |
| abstract_inverted_index.for | 5, 101 |
| abstract_inverted_index.key | 84 |
| abstract_inverted_index.our | 115 |
| abstract_inverted_index.the | 43, 112, 126 |
| abstract_inverted_index.two | 83 |
| abstract_inverted_index.C2G2 | 117 |
| abstract_inverted_index.demo | 128 |
| abstract_inverted_index.from | 14 |
| abstract_inverted_index.lack | 33 |
| abstract_inverted_index.link | 124 |
| abstract_inverted_index.over | 36 |
| abstract_inverted_index.page | 129 |
| abstract_inverted_index.such | 16 |
| abstract_inverted_index.with | 119 |
| abstract_inverted_index.C2G2, | 87 |
| abstract_inverted_index.C2G2. | 65 |
| abstract_inverted_index.found | 132 |
| abstract_inverted_index.named | 64 |
| abstract_inverted_index.these | 31 |
| abstract_inverted_index.avatar | 8 |
| abstract_inverted_index.issues | 15 |
| abstract_inverted_index.latent | 50, 77 |
| abstract_inverted_index.namely | 88 |
| abstract_inverted_index.suffer | 13 |
| abstract_inverted_index.verify | 111 |
| abstract_inverted_index.Gesture | 61 |
| abstract_inverted_index.control | 35 |
| abstract_inverted_index.crucial | 4 |
| abstract_inverted_index.decoder | 91 |
| abstract_inverted_index.digital | 7 |
| abstract_inverted_index.editing | 41 |
| abstract_inverted_index.further | 81 |
| abstract_inverted_index.gesture | 1, 103, 109 |
| abstract_inverted_index.methods | 12, 32 |
| abstract_inverted_index.models. | 79 |
| abstract_inverted_index.project | 127 |
| abstract_inverted_index.propose | 57, 68 |
| abstract_inverted_index.several | 120 |
| abstract_inverted_index.speaker | 37 |
| abstract_inverted_index.Focusing | 46 |
| abstract_inverted_index.However, | 10 |
| abstract_inverted_index.applying | 53 |
| abstract_inverted_index.compared | 118 |
| abstract_inverted_index.datasets | 110 |
| abstract_inverted_index.existing | 11 |
| abstract_inverted_index.features | 85 |
| abstract_inverted_index.flexible | 102 |
| abstract_inverted_index.generate | 93 |
| abstract_inverted_index.identity | 38 |
| abstract_inverted_index.proposed | 116 |
| abstract_inverted_index.strategy | 74, 100 |
| abstract_inverted_index.temporal | 21, 40, 49, 71 |
| abstract_inverted_index.training | 19 |
| abstract_inverted_index.unstable | 18 |
| abstract_inverted_index.Co-speech | 0, 60 |
| abstract_inverted_index.Extensive | 105 |
| abstract_inverted_index.automatic | 6 |
| abstract_inverted_index.benchmark | 108 |
| abstract_inverted_index.capturing | 48 |
| abstract_inverted_index.diffusion | 78 |
| abstract_inverted_index.effective | 34 |
| abstract_inverted_index.generated | 44 |
| abstract_inverted_index.gestures. | 29, 45 |
| abstract_inverted_index.introduce | 82 |
| abstract_inverted_index.motivated | 75 |
| abstract_inverted_index.practical | 54 |
| abstract_inverted_index.skeletons | 96 |
| abstract_inverted_index.two-stage | 70 |
| abstract_inverted_index.Generation | 62 |
| abstract_inverted_index.animation. | 9 |
| abstract_inverted_index.baselines. | 122 |
| abstract_inverted_index.dependency | 72 |
| abstract_inverted_index.framework, | 63 |
| abstract_inverted_index.generating | 25 |
| abstract_inverted_index.generation | 2 |
| abstract_inverted_index.repainting | 99 |
| abstract_inverted_index.enhancement | 73 |
| abstract_inverted_index.experiments | 106 |
| abstract_inverted_index.information | 51 |
| abstract_inverted_index.real-length | 95 |
| abstract_inverted_index.Controllable | 59 |
| abstract_inverted_index.controlling, | 55 |
| abstract_inverted_index.particularly | 23 |
| abstract_inverted_index.Additionally, | 30 |
| abstract_inverted_index.Specifically, | 66 |
| abstract_inverted_index.comprehensive | 28 |
| abstract_inverted_index.effectiveness | 113 |
| abstract_inverted_index.high-fidelity | 26 |
| abstract_inverted_index.inconsistency, | 22 |
| abstract_inverted_index.speaker-related | 94 |
| abstract_inverted_index.speaker-specific | 90 |
| abstract_inverted_index.state-of-the-art | 121 |
| abstract_inverted_index.generation/editing. | 104 |
| abstract_inverted_index.https://c2g2-gesture.github.io/c2_gesture | 134 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 6 |
| citation_normalized_percentile |