Cross-Modality and Within-Modality Regularization for Audio-Visual DeepFake Detection Article Swipe
YOU?
·
· 2024
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2401.05746
Audio-visual deepfake detection scrutinizes manipulations in public video using complementary multimodal cues. Current methods, which train on fused multimodal data for multimodal targets face challenges due to uncertainties and inconsistencies in learned representations caused by independent modality manipulations in deepfake videos. To address this, we propose cross-modality and within-modality regularization to preserve modality distinctions during multimodal representation learning. Our approach includes an audio-visual transformer module for modality correspondence and a cross-modality regularization module to align paired audio-visual signals, preserving modality distinctions. Simultaneously, a within-modality regularization module refines unimodal representations with modality-specific targets to retain modal-specific details. Experimental results on the public audio-visual dataset, FakeAVCeleb, demonstrate the effectiveness and competitiveness of our approach.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2401.05746
- https://arxiv.org/pdf/2401.05746
- OA Status
- green
- Cited By
- 1
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4390833316
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4390833316Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2401.05746Digital Object Identifier
- Title
-
Cross-Modality and Within-Modality Regularization for Audio-Visual DeepFake DetectionWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2024Year of publication
- Publication date
-
2024-01-11Full publication date if available
- Authors
-
Heqing Zou, Meng Shen, Yu‐Chen Hu, Chen Chen, Eng Siong Chng, Deepu RajanList of authors in order
- Landing page
-
https://arxiv.org/abs/2401.05746Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2401.05746Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2401.05746Direct OA link when available
- Concepts
-
Modality (human–computer interaction), Computer science, Audio visual, Regularization (linguistics), Artificial intelligence, MultimediaTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
1Total citation count in OpenAlex
- Citations by year (recent)
-
2025: 1Per-year citation counts (last 5 years)
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4390833316 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2401.05746 |
| ids.doi | https://doi.org/10.48550/arxiv.2401.05746 |
| ids.openalex | https://openalex.org/W4390833316 |
| fwci | |
| type | preprint |
| title | Cross-Modality and Within-Modality Regularization for Audio-Visual DeepFake Detection |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T12357 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9922999739646912 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1707 |
| topics[0].subfield.display_name | Computer Vision and Pattern Recognition |
| topics[0].display_name | Digital Media Forensic Detection |
| topics[1].id | https://openalex.org/T10860 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9907000064849854 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1711 |
| topics[1].subfield.display_name | Signal Processing |
| topics[1].display_name | Speech and Audio Processing |
| topics[2].id | https://openalex.org/T11309 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.9876999855041504 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1711 |
| topics[2].subfield.display_name | Signal Processing |
| topics[2].display_name | Music and Audio Processing |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C2780226545 |
| concepts[0].level | 2 |
| concepts[0].score | 0.844016432762146 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q6888030 |
| concepts[0].display_name | Modality (human–computer interaction) |
| concepts[1].id | https://openalex.org/C41008148 |
| concepts[1].level | 0 |
| concepts[1].score | 0.6616824865341187 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[1].display_name | Computer science |
| concepts[2].id | https://openalex.org/C3017588708 |
| concepts[2].level | 2 |
| concepts[2].score | 0.6092957854270935 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q758901 |
| concepts[2].display_name | Audio visual |
| concepts[3].id | https://openalex.org/C2776135515 |
| concepts[3].level | 2 |
| concepts[3].score | 0.5948255062103271 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q17143721 |
| concepts[3].display_name | Regularization (linguistics) |
| concepts[4].id | https://openalex.org/C154945302 |
| concepts[4].level | 1 |
| concepts[4].score | 0.5285191535949707 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[4].display_name | Artificial intelligence |
| concepts[5].id | https://openalex.org/C49774154 |
| concepts[5].level | 1 |
| concepts[5].score | 0.10773485898971558 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q131765 |
| concepts[5].display_name | Multimedia |
| keywords[0].id | https://openalex.org/keywords/modality |
| keywords[0].score | 0.844016432762146 |
| keywords[0].display_name | Modality (human–computer interaction) |
| keywords[1].id | https://openalex.org/keywords/computer-science |
| keywords[1].score | 0.6616824865341187 |
| keywords[1].display_name | Computer science |
| keywords[2].id | https://openalex.org/keywords/audio-visual |
| keywords[2].score | 0.6092957854270935 |
| keywords[2].display_name | Audio visual |
| keywords[3].id | https://openalex.org/keywords/regularization |
| keywords[3].score | 0.5948255062103271 |
| keywords[3].display_name | Regularization (linguistics) |
| keywords[4].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[4].score | 0.5285191535949707 |
| keywords[4].display_name | Artificial intelligence |
| keywords[5].id | https://openalex.org/keywords/multimedia |
| keywords[5].score | 0.10773485898971558 |
| keywords[5].display_name | Multimedia |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2401.05746 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2401.05746 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2401.05746 |
| locations[1].id | doi:10.48550/arxiv.2401.05746 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2401.05746 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5023152132 |
| authorships[0].author.orcid | https://orcid.org/0000-0003-0038-2822 |
| authorships[0].author.display_name | Heqing Zou |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Zou, Heqing |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5047030842 |
| authorships[1].author.orcid | https://orcid.org/0000-0002-1867-0972 |
| authorships[1].author.display_name | Meng Shen |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Shen, Meng |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5074544822 |
| authorships[2].author.orcid | https://orcid.org/0000-0002-5055-3645 |
| authorships[2].author.display_name | Yu‐Chen Hu |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Hu, Yuchen |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5100418534 |
| authorships[3].author.orcid | https://orcid.org/0000-0003-1064-4961 |
| authorships[3].author.display_name | Chen Chen |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Chen, Chen |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5070872826 |
| authorships[4].author.orcid | https://orcid.org/0000-0001-6257-7399 |
| authorships[4].author.display_name | Eng Siong Chng |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Chng, Eng Siong |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A5009372982 |
| authorships[5].author.orcid | https://orcid.org/0000-0001-7788-8368 |
| authorships[5].author.display_name | Deepu Rajan |
| authorships[5].author_position | last |
| authorships[5].raw_author_name | Rajan, Deepu |
| authorships[5].is_corresponding | False |
| has_content.pdf | True |
| has_content.grobid_xml | True |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2401.05746 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2024-01-13T00:00:00 |
| display_name | Cross-Modality and Within-Modality Regularization for Audio-Visual DeepFake Detection |
| has_fulltext | True |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T12357 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9922999739646912 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1707 |
| primary_topic.subfield.display_name | Computer Vision and Pattern Recognition |
| primary_topic.display_name | Digital Media Forensic Detection |
| related_works | https://openalex.org/W4391375266, https://openalex.org/W2748952813, https://openalex.org/W2271369634, https://openalex.org/W3147472394, https://openalex.org/W2047100085, https://openalex.org/W2350550760, https://openalex.org/W578794879, https://openalex.org/W2625296515, https://openalex.org/W2385859805, https://openalex.org/W3137890128 |
| cited_by_count | 1 |
| counts_by_year[0].year | 2025 |
| counts_by_year[0].cited_by_count | 1 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2401.05746 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2401.05746 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2401.05746 |
| primary_location.id | pmh:oai:arXiv.org:2401.05746 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2401.05746 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2401.05746 |
| publication_date | 2024-01-11 |
| publication_year | 2024 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 69, 82 |
| abstract_inverted_index.To | 41 |
| abstract_inverted_index.an | 61 |
| abstract_inverted_index.by | 34 |
| abstract_inverted_index.in | 5, 30, 38 |
| abstract_inverted_index.of | 109 |
| abstract_inverted_index.on | 16, 98 |
| abstract_inverted_index.to | 26, 50, 73, 92 |
| abstract_inverted_index.we | 44 |
| abstract_inverted_index.Our | 58 |
| abstract_inverted_index.and | 28, 47, 68, 107 |
| abstract_inverted_index.due | 25 |
| abstract_inverted_index.for | 20, 65 |
| abstract_inverted_index.our | 110 |
| abstract_inverted_index.the | 99, 105 |
| abstract_inverted_index.data | 19 |
| abstract_inverted_index.face | 23 |
| abstract_inverted_index.with | 89 |
| abstract_inverted_index.align | 74 |
| abstract_inverted_index.cues. | 11 |
| abstract_inverted_index.fused | 17 |
| abstract_inverted_index.this, | 43 |
| abstract_inverted_index.train | 15 |
| abstract_inverted_index.using | 8 |
| abstract_inverted_index.video | 7 |
| abstract_inverted_index.which | 14 |
| abstract_inverted_index.caused | 33 |
| abstract_inverted_index.during | 54 |
| abstract_inverted_index.module | 64, 72, 85 |
| abstract_inverted_index.paired | 75 |
| abstract_inverted_index.public | 6, 100 |
| abstract_inverted_index.retain | 93 |
| abstract_inverted_index.Current | 12 |
| abstract_inverted_index.address | 42 |
| abstract_inverted_index.learned | 31 |
| abstract_inverted_index.propose | 45 |
| abstract_inverted_index.refines | 86 |
| abstract_inverted_index.results | 97 |
| abstract_inverted_index.targets | 22, 91 |
| abstract_inverted_index.videos. | 40 |
| abstract_inverted_index.approach | 59 |
| abstract_inverted_index.dataset, | 102 |
| abstract_inverted_index.deepfake | 1, 39 |
| abstract_inverted_index.details. | 95 |
| abstract_inverted_index.includes | 60 |
| abstract_inverted_index.methods, | 13 |
| abstract_inverted_index.modality | 36, 52, 66, 79 |
| abstract_inverted_index.preserve | 51 |
| abstract_inverted_index.signals, | 77 |
| abstract_inverted_index.unimodal | 87 |
| abstract_inverted_index.approach. | 111 |
| abstract_inverted_index.detection | 2 |
| abstract_inverted_index.learning. | 57 |
| abstract_inverted_index.challenges | 24 |
| abstract_inverted_index.multimodal | 10, 18, 21, 55 |
| abstract_inverted_index.preserving | 78 |
| abstract_inverted_index.demonstrate | 104 |
| abstract_inverted_index.independent | 35 |
| abstract_inverted_index.scrutinizes | 3 |
| abstract_inverted_index.transformer | 63 |
| abstract_inverted_index.Audio-visual | 0 |
| abstract_inverted_index.Experimental | 96 |
| abstract_inverted_index.FakeAVCeleb, | 103 |
| abstract_inverted_index.audio-visual | 62, 76, 101 |
| abstract_inverted_index.distinctions | 53 |
| abstract_inverted_index.complementary | 9 |
| abstract_inverted_index.distinctions. | 80 |
| abstract_inverted_index.effectiveness | 106 |
| abstract_inverted_index.manipulations | 4, 37 |
| abstract_inverted_index.uncertainties | 27 |
| abstract_inverted_index.correspondence | 67 |
| abstract_inverted_index.cross-modality | 46, 70 |
| abstract_inverted_index.modal-specific | 94 |
| abstract_inverted_index.regularization | 49, 71, 84 |
| abstract_inverted_index.representation | 56 |
| abstract_inverted_index.Simultaneously, | 81 |
| abstract_inverted_index.competitiveness | 108 |
| abstract_inverted_index.inconsistencies | 29 |
| abstract_inverted_index.representations | 32, 88 |
| abstract_inverted_index.within-modality | 48, 83 |
| abstract_inverted_index.modality-specific | 90 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 6 |
| citation_normalized_percentile |