Audio–Visual Fusion Based on Interactive Attention for Person Verification Article Swipe
YOU?
·
· 2023
· Open Access
·
· DOI: https://doi.org/10.3390/s23249845
With the rapid development of multimedia technology, personnel verification systems have become increasingly important in the security field and identity verification. However, unimodal verification systems have performance bottlenecks in complex scenarios, thus triggering the need for multimodal feature fusion methods. The main problem with audio–visual multimodal feature fusion is how to effectively integrate information from different modalities to improve the accuracy and robustness of the system for individual identity. In this paper, we focus on how to improve multimodal person verification systems and how to combine audio and visual features. In this study, we use pretrained models to extract the embeddings from each modality and then perform fusion model experiments based on these embeddings. The baseline approach in this paper involves taking the fusion feature and passing it through a fully connected (FC) layer. Building upon this baseline, we propose three fusion models based on attentional mechanisms: attention, gated, and inter–attention. These fusion models are trained on the VoxCeleb1 development set and tested on the evaluation sets of the VoxCeleb1, NIST SRE19, and CNC-AV datasets. On the VoxCeleb1 dataset, the best system performance achieved in this study was an equal error rate (EER) of 0.23% and a detection cost function (minDCF) of 0.011. On the evaluation set of NIST SRE19, the EER was 2.60% and the minDCF was 0.283. On the evaluation set of the CNC-AV set, the EER was 11.30% and the minDCF was 0.443. These experimental results strongly demonstrate that the proposed fusion method can significantly improve the performance of multimodal character verification systems.
Related Topics
- Type
- article
- Language
- en
- Landing Page
- https://doi.org/10.3390/s23249845
- https://www.mdpi.com/1424-8220/23/24/9845/pdf?version=1702645976
- OA Status
- gold
- Cited By
- 1
- References
- 45
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4389913714
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4389913714Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.3390/s23249845Digital Object Identifier
- Title
-
Audio–Visual Fusion Based on Interactive Attention for Person VerificationWork title
- Type
-
articleOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2023Year of publication
- Publication date
-
2023-12-15Full publication date if available
- Authors
-
Xuebin Jing, Liang He, Zhida Song, Shaolei WangList of authors in order
- Landing page
-
https://doi.org/10.3390/s23249845Publisher landing page
- PDF URL
-
https://www.mdpi.com/1424-8220/23/24/9845/pdf?version=1702645976Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
goldOpen access status per OpenAlex
- OA URL
-
https://www.mdpi.com/1424-8220/23/24/9845/pdf?version=1702645976Direct OA link when available
- Concepts
-
Computer science, NIST, Robustness (evolution), Word error rate, Modalities, Fusion, Artificial intelligence, Set (abstract data type), Feature (linguistics), Audio visual, Modality (human–computer interaction), Baseline (sea), Sensor fusion, Machine learning, Speech recognition, Pattern recognition (psychology), Multimedia, Biochemistry, Gene, Philosophy, Linguistics, Sociology, Oceanography, Social science, Programming language, Geology, ChemistryTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
1Total citation count in OpenAlex
- Citations by year (recent)
-
2024: 1Per-year citation counts (last 5 years)
- References (count)
-
45Number of works referenced by this work
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4389913714 |
|---|---|
| doi | https://doi.org/10.3390/s23249845 |
| ids.doi | https://doi.org/10.3390/s23249845 |
| ids.pmid | https://pubmed.ncbi.nlm.nih.gov/38139689 |
| ids.openalex | https://openalex.org/W4389913714 |
| fwci | 0.26843499 |
| mesh[0].qualifier_ui | |
| mesh[0].descriptor_ui | D006801 |
| mesh[0].is_major_topic | False |
| mesh[0].qualifier_name | |
| mesh[0].descriptor_name | Humans |
| mesh[1].qualifier_ui | |
| mesh[1].descriptor_ui | D000073256 |
| mesh[1].is_major_topic | True |
| mesh[1].qualifier_name | |
| mesh[1].descriptor_name | Information Technology |
| mesh[2].qualifier_ui | |
| mesh[2].descriptor_ui | D056667 |
| mesh[2].is_major_topic | True |
| mesh[2].qualifier_name | |
| mesh[2].descriptor_name | Biometric Identification |
| mesh[3].qualifier_ui | |
| mesh[3].descriptor_ui | D006801 |
| mesh[3].is_major_topic | False |
| mesh[3].qualifier_name | |
| mesh[3].descriptor_name | Humans |
| mesh[4].qualifier_ui | |
| mesh[4].descriptor_ui | D000073256 |
| mesh[4].is_major_topic | True |
| mesh[4].qualifier_name | |
| mesh[4].descriptor_name | Information Technology |
| mesh[5].qualifier_ui | |
| mesh[5].descriptor_ui | D056667 |
| mesh[5].is_major_topic | True |
| mesh[5].qualifier_name | |
| mesh[5].descriptor_name | Biometric Identification |
| mesh[6].qualifier_ui | |
| mesh[6].descriptor_ui | D006801 |
| mesh[6].is_major_topic | False |
| mesh[6].qualifier_name | |
| mesh[6].descriptor_name | Humans |
| mesh[7].qualifier_ui | |
| mesh[7].descriptor_ui | D000073256 |
| mesh[7].is_major_topic | True |
| mesh[7].qualifier_name | |
| mesh[7].descriptor_name | Information Technology |
| type | article |
| title | Audio–Visual Fusion Based on Interactive Attention for Person Verification |
| biblio.issue | 24 |
| biblio.volume | 23 |
| biblio.last_page | 9845 |
| biblio.first_page | 9845 |
| grants[0].funder | https://openalex.org/F4320335777 |
| grants[0].award_id | 2022ZD0115801 |
| grants[0].funder_display_name | National Key Research and Development Program of China |
| topics[0].id | https://openalex.org/T10860 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9987999796867371 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1711 |
| topics[0].subfield.display_name | Signal Processing |
| topics[0].display_name | Speech and Audio Processing |
| topics[1].id | https://openalex.org/T11309 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9986000061035156 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1711 |
| topics[1].subfield.display_name | Signal Processing |
| topics[1].display_name | Music and Audio Processing |
| topics[2].id | https://openalex.org/T11398 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.9944999814033508 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1709 |
| topics[2].subfield.display_name | Human-Computer Interaction |
| topics[2].display_name | Hand Gesture Recognition Systems |
| funders[0].id | https://openalex.org/F4320335777 |
| funders[0].ror | |
| funders[0].display_name | National Key Research and Development Program of China |
| is_xpac | False |
| apc_list.value | 2400 |
| apc_list.currency | CHF |
| apc_list.value_usd | 2598 |
| apc_paid.value | 2400 |
| apc_paid.currency | CHF |
| apc_paid.value_usd | 2598 |
| concepts[0].id | https://openalex.org/C41008148 |
| concepts[0].level | 0 |
| concepts[0].score | 0.7922955751419067 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[0].display_name | Computer science |
| concepts[1].id | https://openalex.org/C111219384 |
| concepts[1].level | 2 |
| concepts[1].score | 0.7841784954071045 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q6954384 |
| concepts[1].display_name | NIST |
| concepts[2].id | https://openalex.org/C63479239 |
| concepts[2].level | 3 |
| concepts[2].score | 0.6609699726104736 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q7353546 |
| concepts[2].display_name | Robustness (evolution) |
| concepts[3].id | https://openalex.org/C40969351 |
| concepts[3].level | 2 |
| concepts[3].score | 0.570180356502533 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q3516228 |
| concepts[3].display_name | Word error rate |
| concepts[4].id | https://openalex.org/C2779903281 |
| concepts[4].level | 2 |
| concepts[4].score | 0.5657870769500732 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q6888026 |
| concepts[4].display_name | Modalities |
| concepts[5].id | https://openalex.org/C158525013 |
| concepts[5].level | 2 |
| concepts[5].score | 0.5562260746955872 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q2593739 |
| concepts[5].display_name | Fusion |
| concepts[6].id | https://openalex.org/C154945302 |
| concepts[6].level | 1 |
| concepts[6].score | 0.5442796349525452 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[6].display_name | Artificial intelligence |
| concepts[7].id | https://openalex.org/C177264268 |
| concepts[7].level | 2 |
| concepts[7].score | 0.5376793742179871 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q1514741 |
| concepts[7].display_name | Set (abstract data type) |
| concepts[8].id | https://openalex.org/C2776401178 |
| concepts[8].level | 2 |
| concepts[8].score | 0.48539531230926514 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q12050496 |
| concepts[8].display_name | Feature (linguistics) |
| concepts[9].id | https://openalex.org/C3017588708 |
| concepts[9].level | 2 |
| concepts[9].score | 0.4574311375617981 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q758901 |
| concepts[9].display_name | Audio visual |
| concepts[10].id | https://openalex.org/C2780226545 |
| concepts[10].level | 2 |
| concepts[10].score | 0.4571736454963684 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q6888030 |
| concepts[10].display_name | Modality (human–computer interaction) |
| concepts[11].id | https://openalex.org/C12725497 |
| concepts[11].level | 2 |
| concepts[11].score | 0.44133228063583374 |
| concepts[11].wikidata | https://www.wikidata.org/wiki/Q810247 |
| concepts[11].display_name | Baseline (sea) |
| concepts[12].id | https://openalex.org/C33954974 |
| concepts[12].level | 2 |
| concepts[12].score | 0.42646270990371704 |
| concepts[12].wikidata | https://www.wikidata.org/wiki/Q486494 |
| concepts[12].display_name | Sensor fusion |
| concepts[13].id | https://openalex.org/C119857082 |
| concepts[13].level | 1 |
| concepts[13].score | 0.41166603565216064 |
| concepts[13].wikidata | https://www.wikidata.org/wiki/Q2539 |
| concepts[13].display_name | Machine learning |
| concepts[14].id | https://openalex.org/C28490314 |
| concepts[14].level | 1 |
| concepts[14].score | 0.3981632888317108 |
| concepts[14].wikidata | https://www.wikidata.org/wiki/Q189436 |
| concepts[14].display_name | Speech recognition |
| concepts[15].id | https://openalex.org/C153180895 |
| concepts[15].level | 2 |
| concepts[15].score | 0.3873976469039917 |
| concepts[15].wikidata | https://www.wikidata.org/wiki/Q7148389 |
| concepts[15].display_name | Pattern recognition (psychology) |
| concepts[16].id | https://openalex.org/C49774154 |
| concepts[16].level | 1 |
| concepts[16].score | 0.11221885681152344 |
| concepts[16].wikidata | https://www.wikidata.org/wiki/Q131765 |
| concepts[16].display_name | Multimedia |
| concepts[17].id | https://openalex.org/C55493867 |
| concepts[17].level | 1 |
| concepts[17].score | 0.0 |
| concepts[17].wikidata | https://www.wikidata.org/wiki/Q7094 |
| concepts[17].display_name | Biochemistry |
| concepts[18].id | https://openalex.org/C104317684 |
| concepts[18].level | 2 |
| concepts[18].score | 0.0 |
| concepts[18].wikidata | https://www.wikidata.org/wiki/Q7187 |
| concepts[18].display_name | Gene |
| concepts[19].id | https://openalex.org/C138885662 |
| concepts[19].level | 0 |
| concepts[19].score | 0.0 |
| concepts[19].wikidata | https://www.wikidata.org/wiki/Q5891 |
| concepts[19].display_name | Philosophy |
| concepts[20].id | https://openalex.org/C41895202 |
| concepts[20].level | 1 |
| concepts[20].score | 0.0 |
| concepts[20].wikidata | https://www.wikidata.org/wiki/Q8162 |
| concepts[20].display_name | Linguistics |
| concepts[21].id | https://openalex.org/C144024400 |
| concepts[21].level | 0 |
| concepts[21].score | 0.0 |
| concepts[21].wikidata | https://www.wikidata.org/wiki/Q21201 |
| concepts[21].display_name | Sociology |
| concepts[22].id | https://openalex.org/C111368507 |
| concepts[22].level | 1 |
| concepts[22].score | 0.0 |
| concepts[22].wikidata | https://www.wikidata.org/wiki/Q43518 |
| concepts[22].display_name | Oceanography |
| concepts[23].id | https://openalex.org/C36289849 |
| concepts[23].level | 1 |
| concepts[23].score | 0.0 |
| concepts[23].wikidata | https://www.wikidata.org/wiki/Q34749 |
| concepts[23].display_name | Social science |
| concepts[24].id | https://openalex.org/C199360897 |
| concepts[24].level | 1 |
| concepts[24].score | 0.0 |
| concepts[24].wikidata | https://www.wikidata.org/wiki/Q9143 |
| concepts[24].display_name | Programming language |
| concepts[25].id | https://openalex.org/C127313418 |
| concepts[25].level | 0 |
| concepts[25].score | 0.0 |
| concepts[25].wikidata | https://www.wikidata.org/wiki/Q1069 |
| concepts[25].display_name | Geology |
| concepts[26].id | https://openalex.org/C185592680 |
| concepts[26].level | 0 |
| concepts[26].score | 0.0 |
| concepts[26].wikidata | https://www.wikidata.org/wiki/Q2329 |
| concepts[26].display_name | Chemistry |
| keywords[0].id | https://openalex.org/keywords/computer-science |
| keywords[0].score | 0.7922955751419067 |
| keywords[0].display_name | Computer science |
| keywords[1].id | https://openalex.org/keywords/nist |
| keywords[1].score | 0.7841784954071045 |
| keywords[1].display_name | NIST |
| keywords[2].id | https://openalex.org/keywords/robustness |
| keywords[2].score | 0.6609699726104736 |
| keywords[2].display_name | Robustness (evolution) |
| keywords[3].id | https://openalex.org/keywords/word-error-rate |
| keywords[3].score | 0.570180356502533 |
| keywords[3].display_name | Word error rate |
| keywords[4].id | https://openalex.org/keywords/modalities |
| keywords[4].score | 0.5657870769500732 |
| keywords[4].display_name | Modalities |
| keywords[5].id | https://openalex.org/keywords/fusion |
| keywords[5].score | 0.5562260746955872 |
| keywords[5].display_name | Fusion |
| keywords[6].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[6].score | 0.5442796349525452 |
| keywords[6].display_name | Artificial intelligence |
| keywords[7].id | https://openalex.org/keywords/set |
| keywords[7].score | 0.5376793742179871 |
| keywords[7].display_name | Set (abstract data type) |
| keywords[8].id | https://openalex.org/keywords/feature |
| keywords[8].score | 0.48539531230926514 |
| keywords[8].display_name | Feature (linguistics) |
| keywords[9].id | https://openalex.org/keywords/audio-visual |
| keywords[9].score | 0.4574311375617981 |
| keywords[9].display_name | Audio visual |
| keywords[10].id | https://openalex.org/keywords/modality |
| keywords[10].score | 0.4571736454963684 |
| keywords[10].display_name | Modality (human–computer interaction) |
| keywords[11].id | https://openalex.org/keywords/baseline |
| keywords[11].score | 0.44133228063583374 |
| keywords[11].display_name | Baseline (sea) |
| keywords[12].id | https://openalex.org/keywords/sensor-fusion |
| keywords[12].score | 0.42646270990371704 |
| keywords[12].display_name | Sensor fusion |
| keywords[13].id | https://openalex.org/keywords/machine-learning |
| keywords[13].score | 0.41166603565216064 |
| keywords[13].display_name | Machine learning |
| keywords[14].id | https://openalex.org/keywords/speech-recognition |
| keywords[14].score | 0.3981632888317108 |
| keywords[14].display_name | Speech recognition |
| keywords[15].id | https://openalex.org/keywords/pattern-recognition |
| keywords[15].score | 0.3873976469039917 |
| keywords[15].display_name | Pattern recognition (psychology) |
| keywords[16].id | https://openalex.org/keywords/multimedia |
| keywords[16].score | 0.11221885681152344 |
| keywords[16].display_name | Multimedia |
| language | en |
| locations[0].id | doi:10.3390/s23249845 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S101949793 |
| locations[0].source.issn | 1424-8220 |
| locations[0].source.type | journal |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | 1424-8220 |
| locations[0].source.is_core | True |
| locations[0].source.is_in_doaj | True |
| locations[0].source.display_name | Sensors |
| locations[0].source.host_organization | https://openalex.org/P4310310987 |
| locations[0].source.host_organization_name | Multidisciplinary Digital Publishing Institute |
| locations[0].source.host_organization_lineage | https://openalex.org/P4310310987 |
| locations[0].license | cc-by |
| locations[0].pdf_url | https://www.mdpi.com/1424-8220/23/24/9845/pdf?version=1702645976 |
| locations[0].version | publishedVersion |
| locations[0].raw_type | journal-article |
| locations[0].license_id | https://openalex.org/licenses/cc-by |
| locations[0].is_accepted | True |
| locations[0].is_published | True |
| locations[0].raw_source_name | Sensors |
| locations[0].landing_page_url | https://doi.org/10.3390/s23249845 |
| locations[1].id | pmid:38139689 |
| locations[1].is_oa | False |
| locations[1].source.id | https://openalex.org/S4306525036 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | False |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | PubMed |
| locations[1].source.host_organization | https://openalex.org/I1299303238 |
| locations[1].source.host_organization_name | National Institutes of Health |
| locations[1].source.host_organization_lineage | https://openalex.org/I1299303238 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | publishedVersion |
| locations[1].raw_type | |
| locations[1].license_id | |
| locations[1].is_accepted | True |
| locations[1].is_published | True |
| locations[1].raw_source_name | Sensors (Basel, Switzerland) |
| locations[1].landing_page_url | https://pubmed.ncbi.nlm.nih.gov/38139689 |
| locations[2].id | pmh:oai:pubmedcentral.nih.gov:10747811 |
| locations[2].is_oa | True |
| locations[2].source.id | https://openalex.org/S2764455111 |
| locations[2].source.issn | |
| locations[2].source.type | repository |
| locations[2].source.is_oa | False |
| locations[2].source.issn_l | |
| locations[2].source.is_core | False |
| locations[2].source.is_in_doaj | False |
| locations[2].source.display_name | PubMed Central |
| locations[2].source.host_organization | https://openalex.org/I1299303238 |
| locations[2].source.host_organization_name | National Institutes of Health |
| locations[2].source.host_organization_lineage | https://openalex.org/I1299303238 |
| locations[2].license | cc-by |
| locations[2].pdf_url | https://pmc.ncbi.nlm.nih.gov/articles/PMC10747811/pdf/sensors-23-09845.pdf |
| locations[2].version | submittedVersion |
| locations[2].raw_type | Text |
| locations[2].license_id | https://openalex.org/licenses/cc-by |
| locations[2].is_accepted | False |
| locations[2].is_published | False |
| locations[2].raw_source_name | Sensors (Basel) |
| locations[2].landing_page_url | https://www.ncbi.nlm.nih.gov/pmc/articles/10747811 |
| locations[3].id | pmh:oai:doaj.org/article:c218800cbf694fb6a38c9cbfee0e9abf |
| locations[3].is_oa | False |
| locations[3].source.id | https://openalex.org/S4306401280 |
| locations[3].source.issn | |
| locations[3].source.type | repository |
| locations[3].source.is_oa | False |
| locations[3].source.issn_l | |
| locations[3].source.is_core | False |
| locations[3].source.is_in_doaj | False |
| locations[3].source.display_name | DOAJ (DOAJ: Directory of Open Access Journals) |
| locations[3].source.host_organization | |
| locations[3].source.host_organization_name | |
| locations[3].source.host_organization_lineage | |
| locations[3].license | |
| locations[3].pdf_url | |
| locations[3].version | submittedVersion |
| locations[3].raw_type | article |
| locations[3].license_id | |
| locations[3].is_accepted | False |
| locations[3].is_published | False |
| locations[3].raw_source_name | Sensors, Vol 23, Iss 24, p 9845 (2023) |
| locations[3].landing_page_url | https://doaj.org/article/c218800cbf694fb6a38c9cbfee0e9abf |
| indexed_in | crossref, doaj, pubmed |
| authorships[0].author.id | https://openalex.org/A5111298718 |
| authorships[0].author.orcid | |
| authorships[0].author.display_name | Xuebin Jing |
| authorships[0].countries | CN |
| authorships[0].affiliations[0].raw_affiliation_string | Xinjiang Key Laboratory of Signal Detection and Processing, Urumqi 830017, China |
| authorships[0].affiliations[1].institution_ids | https://openalex.org/I96908189 |
| authorships[0].affiliations[1].raw_affiliation_string | School of Computer Science and Technology, Xinjiang University, Urumqi 830017, China |
| authorships[0].institutions[0].id | https://openalex.org/I96908189 |
| authorships[0].institutions[0].ror | https://ror.org/059gw8r13 |
| authorships[0].institutions[0].type | education |
| authorships[0].institutions[0].lineage | https://openalex.org/I96908189 |
| authorships[0].institutions[0].country_code | CN |
| authorships[0].institutions[0].display_name | Xinjiang University |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Xuebin Jing |
| authorships[0].is_corresponding | False |
| authorships[0].raw_affiliation_strings | School of Computer Science and Technology, Xinjiang University, Urumqi 830017, China, Xinjiang Key Laboratory of Signal Detection and Processing, Urumqi 830017, China |
| authorships[1].author.id | https://openalex.org/A5049944728 |
| authorships[1].author.orcid | https://orcid.org/0000-0003-4076-7479 |
| authorships[1].author.display_name | Liang He |
| authorships[1].countries | CN |
| authorships[1].affiliations[0].institution_ids | https://openalex.org/I99065089 |
| authorships[1].affiliations[0].raw_affiliation_string | Department of Electronic Engineering, and Beijing National Research Center for Information Science and Technology, Tsinghua University, Beijing 100084, China |
| authorships[1].affiliations[1].institution_ids | https://openalex.org/I96908189 |
| authorships[1].affiliations[1].raw_affiliation_string | School of Computer Science and Technology, Xinjiang University, Urumqi 830017, China |
| authorships[1].affiliations[2].raw_affiliation_string | Xinjiang Key Laboratory of Signal Detection and Processing, Urumqi 830017, China |
| authorships[1].institutions[0].id | https://openalex.org/I99065089 |
| authorships[1].institutions[0].ror | https://ror.org/03cve4549 |
| authorships[1].institutions[0].type | education |
| authorships[1].institutions[0].lineage | https://openalex.org/I99065089 |
| authorships[1].institutions[0].country_code | CN |
| authorships[1].institutions[0].display_name | Tsinghua University |
| authorships[1].institutions[1].id | https://openalex.org/I96908189 |
| authorships[1].institutions[1].ror | https://ror.org/059gw8r13 |
| authorships[1].institutions[1].type | education |
| authorships[1].institutions[1].lineage | https://openalex.org/I96908189 |
| authorships[1].institutions[1].country_code | CN |
| authorships[1].institutions[1].display_name | Xinjiang University |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Liang He |
| authorships[1].is_corresponding | True |
| authorships[1].raw_affiliation_strings | Department of Electronic Engineering, and Beijing National Research Center for Information Science and Technology, Tsinghua University, Beijing 100084, China, School of Computer Science and Technology, Xinjiang University, Urumqi 830017, China, Xinjiang Key Laboratory of Signal Detection and Processing, Urumqi 830017, China |
| authorships[2].author.id | https://openalex.org/A5020817636 |
| authorships[2].author.orcid | https://orcid.org/0000-0002-4987-5962 |
| authorships[2].author.display_name | Zhida Song |
| authorships[2].countries | CN |
| authorships[2].affiliations[0].raw_affiliation_string | Xinjiang Key Laboratory of Signal Detection and Processing, Urumqi 830017, China |
| authorships[2].affiliations[1].institution_ids | https://openalex.org/I96908189 |
| authorships[2].affiliations[1].raw_affiliation_string | School of Computer Science and Technology, Xinjiang University, Urumqi 830017, China |
| authorships[2].institutions[0].id | https://openalex.org/I96908189 |
| authorships[2].institutions[0].ror | https://ror.org/059gw8r13 |
| authorships[2].institutions[0].type | education |
| authorships[2].institutions[0].lineage | https://openalex.org/I96908189 |
| authorships[2].institutions[0].country_code | CN |
| authorships[2].institutions[0].display_name | Xinjiang University |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Zhida Song |
| authorships[2].is_corresponding | False |
| authorships[2].raw_affiliation_strings | School of Computer Science and Technology, Xinjiang University, Urumqi 830017, China, Xinjiang Key Laboratory of Signal Detection and Processing, Urumqi 830017, China |
| authorships[3].author.id | https://openalex.org/A5070059389 |
| authorships[3].author.orcid | https://orcid.org/0000-0002-0691-9179 |
| authorships[3].author.display_name | Shaolei Wang |
| authorships[3].countries | CN |
| authorships[3].affiliations[0].raw_affiliation_string | Xinjiang Key Laboratory of Signal Detection and Processing, Urumqi 830017, China |
| authorships[3].affiliations[1].institution_ids | https://openalex.org/I96908189 |
| authorships[3].affiliations[1].raw_affiliation_string | School of Computer Science and Technology, Xinjiang University, Urumqi 830017, China |
| authorships[3].institutions[0].id | https://openalex.org/I96908189 |
| authorships[3].institutions[0].ror | https://ror.org/059gw8r13 |
| authorships[3].institutions[0].type | education |
| authorships[3].institutions[0].lineage | https://openalex.org/I96908189 |
| authorships[3].institutions[0].country_code | CN |
| authorships[3].institutions[0].display_name | Xinjiang University |
| authorships[3].author_position | last |
| authorships[3].raw_author_name | Shaolei Wang |
| authorships[3].is_corresponding | False |
| authorships[3].raw_affiliation_strings | School of Computer Science and Technology, Xinjiang University, Urumqi 830017, China, Xinjiang Key Laboratory of Signal Detection and Processing, Urumqi 830017, China |
| has_content.pdf | True |
| has_content.grobid_xml | True |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://www.mdpi.com/1424-8220/23/24/9845/pdf?version=1702645976 |
| open_access.oa_status | gold |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Audio–Visual Fusion Based on Interactive Attention for Person Verification |
| has_fulltext | True |
| is_retracted | False |
| updated_date | 2025-11-06T03:46:38.306776 |
| primary_topic.id | https://openalex.org/T10860 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9987999796867371 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1711 |
| primary_topic.subfield.display_name | Signal Processing |
| primary_topic.display_name | Speech and Audio Processing |
| related_works | https://openalex.org/W2158491338, https://openalex.org/W2807901368, https://openalex.org/W2133733652, https://openalex.org/W2072658171, https://openalex.org/W2606392311, https://openalex.org/W2320042380, https://openalex.org/W4385956668, https://openalex.org/W2900895161, https://openalex.org/W4380838366, https://openalex.org/W2539884462 |
| cited_by_count | 1 |
| counts_by_year[0].year | 2024 |
| counts_by_year[0].cited_by_count | 1 |
| locations_count | 4 |
| best_oa_location.id | doi:10.3390/s23249845 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S101949793 |
| best_oa_location.source.issn | 1424-8220 |
| best_oa_location.source.type | journal |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | 1424-8220 |
| best_oa_location.source.is_core | True |
| best_oa_location.source.is_in_doaj | True |
| best_oa_location.source.display_name | Sensors |
| best_oa_location.source.host_organization | https://openalex.org/P4310310987 |
| best_oa_location.source.host_organization_name | Multidisciplinary Digital Publishing Institute |
| best_oa_location.source.host_organization_lineage | https://openalex.org/P4310310987 |
| best_oa_location.license | cc-by |
| best_oa_location.pdf_url | https://www.mdpi.com/1424-8220/23/24/9845/pdf?version=1702645976 |
| best_oa_location.version | publishedVersion |
| best_oa_location.raw_type | journal-article |
| best_oa_location.license_id | https://openalex.org/licenses/cc-by |
| best_oa_location.is_accepted | True |
| best_oa_location.is_published | True |
| best_oa_location.raw_source_name | Sensors |
| best_oa_location.landing_page_url | https://doi.org/10.3390/s23249845 |
| primary_location.id | doi:10.3390/s23249845 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S101949793 |
| primary_location.source.issn | 1424-8220 |
| primary_location.source.type | journal |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | 1424-8220 |
| primary_location.source.is_core | True |
| primary_location.source.is_in_doaj | True |
| primary_location.source.display_name | Sensors |
| primary_location.source.host_organization | https://openalex.org/P4310310987 |
| primary_location.source.host_organization_name | Multidisciplinary Digital Publishing Institute |
| primary_location.source.host_organization_lineage | https://openalex.org/P4310310987 |
| primary_location.license | cc-by |
| primary_location.pdf_url | https://www.mdpi.com/1424-8220/23/24/9845/pdf?version=1702645976 |
| primary_location.version | publishedVersion |
| primary_location.raw_type | journal-article |
| primary_location.license_id | https://openalex.org/licenses/cc-by |
| primary_location.is_accepted | True |
| primary_location.is_published | True |
| primary_location.raw_source_name | Sensors |
| primary_location.landing_page_url | https://doi.org/10.3390/s23249845 |
| publication_date | 2023-12-15 |
| publication_year | 2023 |
| referenced_works | https://openalex.org/W2890964092, https://openalex.org/W2173629880, https://openalex.org/W3024869864, https://openalex.org/W2969985801, https://openalex.org/W2747238065, https://openalex.org/W2784163702, https://openalex.org/W2902299888, https://openalex.org/W2777661975, https://openalex.org/W2163605009, https://openalex.org/W2145287260, https://openalex.org/W1950843348, https://openalex.org/W1998808035, https://openalex.org/W2325939864, https://openalex.org/W2096733369, https://openalex.org/W1516392727, https://openalex.org/W2150769028, https://openalex.org/W2046056978, https://openalex.org/W2748488820, https://openalex.org/W2799429369, https://openalex.org/W3025804484, https://openalex.org/W2154383330, https://openalex.org/W3126259644, https://openalex.org/W3126757411, https://openalex.org/W3209112689, https://openalex.org/W4224924564, https://openalex.org/W4379985555, https://openalex.org/W4293222191, https://openalex.org/W4304775708, https://openalex.org/W4372346152, https://openalex.org/W4372259900, https://openalex.org/W4372265901, https://openalex.org/W4382060833, https://openalex.org/W4308448078, https://openalex.org/W4361853108, https://openalex.org/W2806318065, https://openalex.org/W2726515241, https://openalex.org/W2808631503, https://openalex.org/W3161402421, https://openalex.org/W4385823274, https://openalex.org/W3026006730, https://openalex.org/W1832693441, https://openalex.org/W2019370496, https://openalex.org/W2736633948, https://openalex.org/W2520774990, https://openalex.org/W3103152812 |
| referenced_works_count | 45 |
| abstract_inverted_index.a | 129, 196 |
| abstract_inverted_index.In | 69, 90 |
| abstract_inverted_index.On | 175, 203, 219 |
| abstract_inverted_index.an | 188 |
| abstract_inverted_index.in | 14, 28, 117, 184 |
| abstract_inverted_index.is | 48 |
| abstract_inverted_index.it | 127 |
| abstract_inverted_index.of | 4, 63, 167, 193, 201, 207, 223, 251 |
| abstract_inverted_index.on | 74, 111, 144, 156, 163 |
| abstract_inverted_index.to | 50, 57, 76, 84, 97 |
| abstract_inverted_index.we | 72, 93, 138 |
| abstract_inverted_index.EER | 211, 228 |
| abstract_inverted_index.The | 40, 114 |
| abstract_inverted_index.and | 18, 61, 82, 87, 104, 125, 149, 161, 172, 195, 214, 231 |
| abstract_inverted_index.are | 154 |
| abstract_inverted_index.can | 246 |
| abstract_inverted_index.for | 35, 66 |
| abstract_inverted_index.how | 49, 75, 83 |
| abstract_inverted_index.set | 160, 206, 222 |
| abstract_inverted_index.the | 1, 15, 33, 59, 64, 99, 122, 157, 164, 168, 176, 179, 204, 210, 215, 220, 224, 227, 232, 242, 249 |
| abstract_inverted_index.use | 94 |
| abstract_inverted_index.was | 187, 212, 217, 229, 234 |
| abstract_inverted_index.(FC) | 132 |
| abstract_inverted_index.NIST | 170, 208 |
| abstract_inverted_index.With | 0 |
| abstract_inverted_index.best | 180 |
| abstract_inverted_index.cost | 198 |
| abstract_inverted_index.each | 102 |
| abstract_inverted_index.from | 54, 101 |
| abstract_inverted_index.have | 10, 25 |
| abstract_inverted_index.main | 41 |
| abstract_inverted_index.need | 34 |
| abstract_inverted_index.rate | 191 |
| abstract_inverted_index.set, | 226 |
| abstract_inverted_index.sets | 166 |
| abstract_inverted_index.that | 241 |
| abstract_inverted_index.then | 105 |
| abstract_inverted_index.this | 70, 91, 118, 136, 185 |
| abstract_inverted_index.thus | 31 |
| abstract_inverted_index.upon | 135 |
| abstract_inverted_index.with | 43 |
| abstract_inverted_index.(EER) | 192 |
| abstract_inverted_index.0.23% | 194 |
| abstract_inverted_index.2.60% | 213 |
| abstract_inverted_index.These | 151, 236 |
| abstract_inverted_index.audio | 86 |
| abstract_inverted_index.based | 110, 143 |
| abstract_inverted_index.equal | 189 |
| abstract_inverted_index.error | 190 |
| abstract_inverted_index.field | 17 |
| abstract_inverted_index.focus | 73 |
| abstract_inverted_index.fully | 130 |
| abstract_inverted_index.model | 108 |
| abstract_inverted_index.paper | 119 |
| abstract_inverted_index.rapid | 2 |
| abstract_inverted_index.study | 186 |
| abstract_inverted_index.these | 112 |
| abstract_inverted_index.three | 140 |
| abstract_inverted_index.0.011. | 202 |
| abstract_inverted_index.0.283. | 218 |
| abstract_inverted_index.0.443. | 235 |
| abstract_inverted_index.11.30% | 230 |
| abstract_inverted_index.CNC-AV | 173, 225 |
| abstract_inverted_index.SRE19, | 171, 209 |
| abstract_inverted_index.become | 11 |
| abstract_inverted_index.fusion | 38, 47, 107, 123, 141, 152, 244 |
| abstract_inverted_index.gated, | 148 |
| abstract_inverted_index.layer. | 133 |
| abstract_inverted_index.method | 245 |
| abstract_inverted_index.minDCF | 216, 233 |
| abstract_inverted_index.models | 96, 142, 153 |
| abstract_inverted_index.paper, | 71 |
| abstract_inverted_index.person | 79 |
| abstract_inverted_index.study, | 92 |
| abstract_inverted_index.system | 65, 181 |
| abstract_inverted_index.taking | 121 |
| abstract_inverted_index.tested | 162 |
| abstract_inverted_index.visual | 88 |
| abstract_inverted_index.combine | 85 |
| abstract_inverted_index.complex | 29 |
| abstract_inverted_index.extract | 98 |
| abstract_inverted_index.feature | 37, 46, 124 |
| abstract_inverted_index.improve | 58, 77, 248 |
| abstract_inverted_index.passing | 126 |
| abstract_inverted_index.perform | 106 |
| abstract_inverted_index.problem | 42 |
| abstract_inverted_index.propose | 139 |
| abstract_inverted_index.results | 238 |
| abstract_inverted_index.systems | 9, 24, 81 |
| abstract_inverted_index.through | 128 |
| abstract_inverted_index.trained | 155 |
| abstract_inverted_index.(minDCF) | 200 |
| abstract_inverted_index.Building | 134 |
| abstract_inverted_index.However, | 21 |
| abstract_inverted_index.accuracy | 60 |
| abstract_inverted_index.achieved | 183 |
| abstract_inverted_index.approach | 116 |
| abstract_inverted_index.baseline | 115 |
| abstract_inverted_index.dataset, | 178 |
| abstract_inverted_index.function | 199 |
| abstract_inverted_index.identity | 19 |
| abstract_inverted_index.involves | 120 |
| abstract_inverted_index.methods. | 39 |
| abstract_inverted_index.modality | 103 |
| abstract_inverted_index.proposed | 243 |
| abstract_inverted_index.security | 16 |
| abstract_inverted_index.strongly | 239 |
| abstract_inverted_index.systems. | 255 |
| abstract_inverted_index.unimodal | 22 |
| abstract_inverted_index.VoxCeleb1 | 158, 177 |
| abstract_inverted_index.baseline, | 137 |
| abstract_inverted_index.character | 253 |
| abstract_inverted_index.connected | 131 |
| abstract_inverted_index.datasets. | 174 |
| abstract_inverted_index.detection | 197 |
| abstract_inverted_index.different | 55 |
| abstract_inverted_index.features. | 89 |
| abstract_inverted_index.identity. | 68 |
| abstract_inverted_index.important | 13 |
| abstract_inverted_index.integrate | 52 |
| abstract_inverted_index.personnel | 7 |
| abstract_inverted_index.VoxCeleb1, | 169 |
| abstract_inverted_index.attention, | 147 |
| abstract_inverted_index.embeddings | 100 |
| abstract_inverted_index.evaluation | 165, 205, 221 |
| abstract_inverted_index.individual | 67 |
| abstract_inverted_index.modalities | 56 |
| abstract_inverted_index.multimedia | 5 |
| abstract_inverted_index.multimodal | 36, 45, 78, 252 |
| abstract_inverted_index.pretrained | 95 |
| abstract_inverted_index.robustness | 62 |
| abstract_inverted_index.scenarios, | 30 |
| abstract_inverted_index.triggering | 32 |
| abstract_inverted_index.attentional | 145 |
| abstract_inverted_index.bottlenecks | 27 |
| abstract_inverted_index.demonstrate | 240 |
| abstract_inverted_index.development | 3, 159 |
| abstract_inverted_index.effectively | 51 |
| abstract_inverted_index.embeddings. | 113 |
| abstract_inverted_index.experiments | 109 |
| abstract_inverted_index.information | 53 |
| abstract_inverted_index.mechanisms: | 146 |
| abstract_inverted_index.performance | 26, 182, 250 |
| abstract_inverted_index.technology, | 6 |
| abstract_inverted_index.experimental | 237 |
| abstract_inverted_index.increasingly | 12 |
| abstract_inverted_index.verification | 8, 23, 80, 254 |
| abstract_inverted_index.significantly | 247 |
| abstract_inverted_index.verification. | 20 |
| abstract_inverted_index.audio–visual | 44 |
| abstract_inverted_index.inter–attention. | 150 |
| cited_by_percentile_year.max | 94 |
| cited_by_percentile_year.min | 90 |
| corresponding_author_ids | https://openalex.org/A5049944728 |
| countries_distinct_count | 1 |
| institutions_distinct_count | 4 |
| corresponding_institution_ids | https://openalex.org/I96908189, https://openalex.org/I99065089 |
| sustainable_development_goals[0].id | https://metadata.un.org/sdg/9 |
| sustainable_development_goals[0].score | 0.4300000071525574 |
| sustainable_development_goals[0].display_name | Industry, innovation and infrastructure |
| citation_normalized_percentile.value | 0.52932912 |
| citation_normalized_percentile.is_in_top_1_percent | False |
| citation_normalized_percentile.is_in_top_10_percent | False |