Audio-Visual Speaker Diarization: Current Databases, Approaches and Challenges Article Swipe
YOU?
·
· 2024
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2409.05659
Nowadays, the large amount of audio-visual content available has fostered the need to develop new robust automatic speaker diarization systems to analyse and characterise it. This kind of system helps to reduce the cost of doing this process manually and allows the use of the speaker information for different applications, as a huge quantity of information is present, for example, images of faces, or audio recordings. Therefore, this paper aims to address a critical area in the field of speaker diarization systems, the integration of audio-visual content of different domains. This paper seeks to push beyond current state-of-the-art practices by developing a robust audio-visual speaker diarization framework adaptable to various data domains, including TV scenarios, meetings, and daily activities. Unlike most of the existing audio-visual speaker diarization systems, this framework will also include the proposal of an approach to lead the precise assignment of specific identities in TV scenarios where celebrities appear. In addition, in this work, we have conducted an extensive compilation of the current state-of-the-art approaches and the existing databases for developing audio-visual speaker diarization.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2409.05659
- https://arxiv.org/pdf/2409.05659
- OA Status
- green
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4403618680
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4403618680Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2409.05659Digital Object Identifier
- Title
-
Audio-Visual Speaker Diarization: Current Databases, Approaches and ChallengesWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2024Year of publication
- Publication date
-
2024-09-09Full publication date if available
- Authors
-
Victoria Mingote, Alfonso Ortega, Antonio Miguel, Eduardo LleidaList of authors in order
- Landing page
-
https://arxiv.org/abs/2409.05659Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2409.05659Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2409.05659Direct OA link when available
- Concepts
-
Speaker diarisation, Computer science, Speech recognition, Audio visual, Current (fluid), Database, Speaker recognition, Multimedia, Engineering, Electrical engineeringTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4403618680 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2409.05659 |
| ids.doi | https://doi.org/10.48550/arxiv.2409.05659 |
| ids.openalex | https://openalex.org/W4403618680 |
| fwci | |
| type | preprint |
| title | Audio-Visual Speaker Diarization: Current Databases, Approaches and Challenges |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10860 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9779999852180481 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1711 |
| topics[0].subfield.display_name | Signal Processing |
| topics[0].display_name | Speech and Audio Processing |
| topics[1].id | https://openalex.org/T10201 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9739000201225281 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1702 |
| topics[1].subfield.display_name | Artificial Intelligence |
| topics[1].display_name | Speech Recognition and Synthesis |
| topics[2].id | https://openalex.org/T11309 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.9627000093460083 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1711 |
| topics[2].subfield.display_name | Signal Processing |
| topics[2].display_name | Music and Audio Processing |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C149838564 |
| concepts[0].level | 3 |
| concepts[0].score | 0.844657301902771 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q7574248 |
| concepts[0].display_name | Speaker diarisation |
| concepts[1].id | https://openalex.org/C41008148 |
| concepts[1].level | 0 |
| concepts[1].score | 0.6543349623680115 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[1].display_name | Computer science |
| concepts[2].id | https://openalex.org/C28490314 |
| concepts[2].level | 1 |
| concepts[2].score | 0.5843367576599121 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q189436 |
| concepts[2].display_name | Speech recognition |
| concepts[3].id | https://openalex.org/C3017588708 |
| concepts[3].level | 2 |
| concepts[3].score | 0.5565970540046692 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q758901 |
| concepts[3].display_name | Audio visual |
| concepts[4].id | https://openalex.org/C148043351 |
| concepts[4].level | 2 |
| concepts[4].score | 0.5010640621185303 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q4456944 |
| concepts[4].display_name | Current (fluid) |
| concepts[5].id | https://openalex.org/C77088390 |
| concepts[5].level | 1 |
| concepts[5].score | 0.44983798265457153 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q8513 |
| concepts[5].display_name | Database |
| concepts[6].id | https://openalex.org/C133892786 |
| concepts[6].level | 2 |
| concepts[6].score | 0.38840731978416443 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q1145189 |
| concepts[6].display_name | Speaker recognition |
| concepts[7].id | https://openalex.org/C49774154 |
| concepts[7].level | 1 |
| concepts[7].score | 0.16057053208351135 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q131765 |
| concepts[7].display_name | Multimedia |
| concepts[8].id | https://openalex.org/C127413603 |
| concepts[8].level | 0 |
| concepts[8].score | 0.07628786563873291 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q11023 |
| concepts[8].display_name | Engineering |
| concepts[9].id | https://openalex.org/C119599485 |
| concepts[9].level | 1 |
| concepts[9].score | 0.0 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q43035 |
| concepts[9].display_name | Electrical engineering |
| keywords[0].id | https://openalex.org/keywords/speaker-diarisation |
| keywords[0].score | 0.844657301902771 |
| keywords[0].display_name | Speaker diarisation |
| keywords[1].id | https://openalex.org/keywords/computer-science |
| keywords[1].score | 0.6543349623680115 |
| keywords[1].display_name | Computer science |
| keywords[2].id | https://openalex.org/keywords/speech-recognition |
| keywords[2].score | 0.5843367576599121 |
| keywords[2].display_name | Speech recognition |
| keywords[3].id | https://openalex.org/keywords/audio-visual |
| keywords[3].score | 0.5565970540046692 |
| keywords[3].display_name | Audio visual |
| keywords[4].id | https://openalex.org/keywords/current |
| keywords[4].score | 0.5010640621185303 |
| keywords[4].display_name | Current (fluid) |
| keywords[5].id | https://openalex.org/keywords/database |
| keywords[5].score | 0.44983798265457153 |
| keywords[5].display_name | Database |
| keywords[6].id | https://openalex.org/keywords/speaker-recognition |
| keywords[6].score | 0.38840731978416443 |
| keywords[6].display_name | Speaker recognition |
| keywords[7].id | https://openalex.org/keywords/multimedia |
| keywords[7].score | 0.16057053208351135 |
| keywords[7].display_name | Multimedia |
| keywords[8].id | https://openalex.org/keywords/engineering |
| keywords[8].score | 0.07628786563873291 |
| keywords[8].display_name | Engineering |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2409.05659 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2409.05659 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2409.05659 |
| locations[1].id | doi:10.48550/arxiv.2409.05659 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2409.05659 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5053658389 |
| authorships[0].author.orcid | https://orcid.org/0000-0002-3505-0249 |
| authorships[0].author.display_name | Victoria Mingote |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Mingote, Victoria |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5101903408 |
| authorships[1].author.orcid | https://orcid.org/0000-0002-3886-7748 |
| authorships[1].author.display_name | Alfonso Ortega |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Ortega, Alfonso |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5113499347 |
| authorships[2].author.orcid | https://orcid.org/0000-0002-1552-5542 |
| authorships[2].author.display_name | Antonio Miguel |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Miguel, Antonio |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5036493563 |
| authorships[3].author.orcid | https://orcid.org/0000-0001-9137-4013 |
| authorships[3].author.display_name | Eduardo Lleida |
| authorships[3].author_position | last |
| authorships[3].raw_author_name | Lleida, Eduardo |
| authorships[3].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2409.05659 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Audio-Visual Speaker Diarization: Current Databases, Approaches and Challenges |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10860 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9779999852180481 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1711 |
| primary_topic.subfield.display_name | Signal Processing |
| primary_topic.display_name | Speech and Audio Processing |
| related_works | https://openalex.org/W2206035908, https://openalex.org/W1491159402, https://openalex.org/W4297807400, https://openalex.org/W4389984014, https://openalex.org/W2144208207, https://openalex.org/W1509309911, https://openalex.org/W1940231550, https://openalex.org/W1599425004, https://openalex.org/W2118860825, https://openalex.org/W2096510939 |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2409.05659 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2409.05659 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2409.05659 |
| primary_location.id | pmh:oai:arXiv.org:2409.05659 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2409.05659 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2409.05659 |
| publication_date | 2024-09-09 |
| publication_year | 2024 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 51, 72, 101 |
| abstract_inverted_index.In | 152 |
| abstract_inverted_index.TV | 113, 147 |
| abstract_inverted_index.an | 136, 160 |
| abstract_inverted_index.as | 50 |
| abstract_inverted_index.by | 99 |
| abstract_inverted_index.in | 75, 146, 154 |
| abstract_inverted_index.is | 56 |
| abstract_inverted_index.of | 4, 27, 34, 43, 54, 61, 78, 84, 87, 121, 135, 143, 163 |
| abstract_inverted_index.or | 63 |
| abstract_inverted_index.to | 12, 20, 30, 70, 93, 108, 138 |
| abstract_inverted_index.we | 157 |
| abstract_inverted_index.and | 22, 39, 116, 168 |
| abstract_inverted_index.for | 47, 58, 172 |
| abstract_inverted_index.has | 8 |
| abstract_inverted_index.it. | 24 |
| abstract_inverted_index.new | 14 |
| abstract_inverted_index.the | 1, 10, 32, 41, 44, 76, 82, 122, 133, 140, 164, 169 |
| abstract_inverted_index.use | 42 |
| abstract_inverted_index.This | 25, 90 |
| abstract_inverted_index.aims | 69 |
| abstract_inverted_index.also | 131 |
| abstract_inverted_index.area | 74 |
| abstract_inverted_index.cost | 33 |
| abstract_inverted_index.data | 110 |
| abstract_inverted_index.have | 158 |
| abstract_inverted_index.huge | 52 |
| abstract_inverted_index.kind | 26 |
| abstract_inverted_index.lead | 139 |
| abstract_inverted_index.most | 120 |
| abstract_inverted_index.need | 11 |
| abstract_inverted_index.push | 94 |
| abstract_inverted_index.this | 36, 67, 128, 155 |
| abstract_inverted_index.will | 130 |
| abstract_inverted_index.audio | 64 |
| abstract_inverted_index.daily | 117 |
| abstract_inverted_index.doing | 35 |
| abstract_inverted_index.field | 77 |
| abstract_inverted_index.helps | 29 |
| abstract_inverted_index.large | 2 |
| abstract_inverted_index.paper | 68, 91 |
| abstract_inverted_index.seeks | 92 |
| abstract_inverted_index.where | 149 |
| abstract_inverted_index.work, | 156 |
| abstract_inverted_index.Unlike | 119 |
| abstract_inverted_index.allows | 40 |
| abstract_inverted_index.amount | 3 |
| abstract_inverted_index.beyond | 95 |
| abstract_inverted_index.faces, | 62 |
| abstract_inverted_index.images | 60 |
| abstract_inverted_index.reduce | 31 |
| abstract_inverted_index.robust | 15, 102 |
| abstract_inverted_index.system | 28 |
| abstract_inverted_index.address | 71 |
| abstract_inverted_index.analyse | 21 |
| abstract_inverted_index.appear. | 151 |
| abstract_inverted_index.content | 6, 86 |
| abstract_inverted_index.current | 96, 165 |
| abstract_inverted_index.develop | 13 |
| abstract_inverted_index.include | 132 |
| abstract_inverted_index.precise | 141 |
| abstract_inverted_index.process | 37 |
| abstract_inverted_index.speaker | 17, 45, 79, 104, 125, 175 |
| abstract_inverted_index.systems | 19 |
| abstract_inverted_index.various | 109 |
| abstract_inverted_index.approach | 137 |
| abstract_inverted_index.critical | 73 |
| abstract_inverted_index.domains, | 111 |
| abstract_inverted_index.domains. | 89 |
| abstract_inverted_index.example, | 59 |
| abstract_inverted_index.existing | 123, 170 |
| abstract_inverted_index.fostered | 9 |
| abstract_inverted_index.manually | 38 |
| abstract_inverted_index.present, | 57 |
| abstract_inverted_index.proposal | 134 |
| abstract_inverted_index.quantity | 53 |
| abstract_inverted_index.specific | 144 |
| abstract_inverted_index.systems, | 81, 127 |
| abstract_inverted_index.Nowadays, | 0 |
| abstract_inverted_index.adaptable | 107 |
| abstract_inverted_index.addition, | 153 |
| abstract_inverted_index.automatic | 16 |
| abstract_inverted_index.available | 7 |
| abstract_inverted_index.conducted | 159 |
| abstract_inverted_index.databases | 171 |
| abstract_inverted_index.different | 48, 88 |
| abstract_inverted_index.extensive | 161 |
| abstract_inverted_index.framework | 106, 129 |
| abstract_inverted_index.including | 112 |
| abstract_inverted_index.meetings, | 115 |
| abstract_inverted_index.practices | 98 |
| abstract_inverted_index.scenarios | 148 |
| abstract_inverted_index.Therefore, | 66 |
| abstract_inverted_index.approaches | 167 |
| abstract_inverted_index.assignment | 142 |
| abstract_inverted_index.developing | 100, 173 |
| abstract_inverted_index.identities | 145 |
| abstract_inverted_index.scenarios, | 114 |
| abstract_inverted_index.activities. | 118 |
| abstract_inverted_index.celebrities | 150 |
| abstract_inverted_index.compilation | 162 |
| abstract_inverted_index.diarization | 18, 80, 105, 126 |
| abstract_inverted_index.information | 46, 55 |
| abstract_inverted_index.integration | 83 |
| abstract_inverted_index.recordings. | 65 |
| abstract_inverted_index.audio-visual | 5, 85, 103, 124, 174 |
| abstract_inverted_index.characterise | 23 |
| abstract_inverted_index.diarization. | 176 |
| abstract_inverted_index.applications, | 49 |
| abstract_inverted_index.state-of-the-art | 97, 166 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 4 |
| citation_normalized_percentile |