Multi-query multi-head attention pooling and Inter-topK penalty for speaker verification Article Swipe
YOU?
·
· 2021
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2110.05042
This paper describes the multi-query multi-head attention (MQMHA) pooling and inter-topK penalty methods which were first proposed in our submitted system description for VoxCeleb speaker recognition challenge (VoxSRC) 2021. Most multi-head attention pooling mechanisms either attend to the whole feature through multiple heads or attend to several split parts of the whole feature. Our proposed MQMHA combines both these two mechanisms and gain more diversified information. The margin-based softmax loss functions are commonly adopted to obtain discriminative speaker representations. To further enhance the inter-class discriminability, we propose a method that adds an extra inter-topK penalty on some confused speakers. By adopting both the MQMHA and inter-topK penalty, we achieved state-of-the-art performance in all of the public VoxCeleb test sets.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2110.05042
- https://arxiv.org/pdf/2110.05042
- OA Status
- green
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4303858732
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4303858732Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2110.05042Digital Object Identifier
- Title
-
Multi-query multi-head attention pooling and Inter-topK penalty for speaker verificationWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2021Year of publication
- Publication date
-
2021-10-11Full publication date if available
- Authors
-
Miao Zhao, Yufeng Ma, Yiwei Ding, Yu Zheng, Min Liu, Minqiang XuList of authors in order
- Landing page
-
https://arxiv.org/abs/2110.05042Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2110.05042Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2110.05042Direct OA link when available
- Concepts
-
Pooling, Discriminative model, Softmax function, Computer science, Margin (machine learning), Feature (linguistics), Head (geology), Artificial intelligence, Pattern recognition (psychology), Speech recognition, Machine learning, Natural language processing, Convolutional neural network, Geomorphology, Geology, Linguistics, PhilosophyTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4303858732 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2110.05042 |
| ids.doi | https://doi.org/10.48550/arxiv.2110.05042 |
| ids.openalex | https://openalex.org/W4303858732 |
| fwci | |
| type | preprint |
| title | Multi-query multi-head attention pooling and Inter-topK penalty for speaker verification |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10201 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9994999766349792 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1702 |
| topics[0].subfield.display_name | Artificial Intelligence |
| topics[0].display_name | Speech Recognition and Synthesis |
| topics[1].id | https://openalex.org/T10860 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9879999756813049 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1711 |
| topics[1].subfield.display_name | Signal Processing |
| topics[1].display_name | Speech and Audio Processing |
| topics[2].id | https://openalex.org/T11309 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.9531000256538391 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1711 |
| topics[2].subfield.display_name | Signal Processing |
| topics[2].display_name | Music and Audio Processing |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C70437156 |
| concepts[0].level | 2 |
| concepts[0].score | 0.9099464416503906 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q7228652 |
| concepts[0].display_name | Pooling |
| concepts[1].id | https://openalex.org/C97931131 |
| concepts[1].level | 2 |
| concepts[1].score | 0.842678427696228 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q5282087 |
| concepts[1].display_name | Discriminative model |
| concepts[2].id | https://openalex.org/C188441871 |
| concepts[2].level | 3 |
| concepts[2].score | 0.8134415149688721 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q7554146 |
| concepts[2].display_name | Softmax function |
| concepts[3].id | https://openalex.org/C41008148 |
| concepts[3].level | 0 |
| concepts[3].score | 0.7654459476470947 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[3].display_name | Computer science |
| concepts[4].id | https://openalex.org/C774472 |
| concepts[4].level | 2 |
| concepts[4].score | 0.6358084678649902 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q6760393 |
| concepts[4].display_name | Margin (machine learning) |
| concepts[5].id | https://openalex.org/C2776401178 |
| concepts[5].level | 2 |
| concepts[5].score | 0.6302101016044617 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q12050496 |
| concepts[5].display_name | Feature (linguistics) |
| concepts[6].id | https://openalex.org/C2780312720 |
| concepts[6].level | 2 |
| concepts[6].score | 0.507874071598053 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q5689100 |
| concepts[6].display_name | Head (geology) |
| concepts[7].id | https://openalex.org/C154945302 |
| concepts[7].level | 1 |
| concepts[7].score | 0.49939537048339844 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[7].display_name | Artificial intelligence |
| concepts[8].id | https://openalex.org/C153180895 |
| concepts[8].level | 2 |
| concepts[8].score | 0.4431264102458954 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q7148389 |
| concepts[8].display_name | Pattern recognition (psychology) |
| concepts[9].id | https://openalex.org/C28490314 |
| concepts[9].level | 1 |
| concepts[9].score | 0.4028703570365906 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q189436 |
| concepts[9].display_name | Speech recognition |
| concepts[10].id | https://openalex.org/C119857082 |
| concepts[10].level | 1 |
| concepts[10].score | 0.3880019187927246 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q2539 |
| concepts[10].display_name | Machine learning |
| concepts[11].id | https://openalex.org/C204321447 |
| concepts[11].level | 1 |
| concepts[11].score | 0.3274744153022766 |
| concepts[11].wikidata | https://www.wikidata.org/wiki/Q30642 |
| concepts[11].display_name | Natural language processing |
| concepts[12].id | https://openalex.org/C81363708 |
| concepts[12].level | 2 |
| concepts[12].score | 0.13095435500144958 |
| concepts[12].wikidata | https://www.wikidata.org/wiki/Q17084460 |
| concepts[12].display_name | Convolutional neural network |
| concepts[13].id | https://openalex.org/C114793014 |
| concepts[13].level | 1 |
| concepts[13].score | 0.0 |
| concepts[13].wikidata | https://www.wikidata.org/wiki/Q52109 |
| concepts[13].display_name | Geomorphology |
| concepts[14].id | https://openalex.org/C127313418 |
| concepts[14].level | 0 |
| concepts[14].score | 0.0 |
| concepts[14].wikidata | https://www.wikidata.org/wiki/Q1069 |
| concepts[14].display_name | Geology |
| concepts[15].id | https://openalex.org/C41895202 |
| concepts[15].level | 1 |
| concepts[15].score | 0.0 |
| concepts[15].wikidata | https://www.wikidata.org/wiki/Q8162 |
| concepts[15].display_name | Linguistics |
| concepts[16].id | https://openalex.org/C138885662 |
| concepts[16].level | 0 |
| concepts[16].score | 0.0 |
| concepts[16].wikidata | https://www.wikidata.org/wiki/Q5891 |
| concepts[16].display_name | Philosophy |
| keywords[0].id | https://openalex.org/keywords/pooling |
| keywords[0].score | 0.9099464416503906 |
| keywords[0].display_name | Pooling |
| keywords[1].id | https://openalex.org/keywords/discriminative-model |
| keywords[1].score | 0.842678427696228 |
| keywords[1].display_name | Discriminative model |
| keywords[2].id | https://openalex.org/keywords/softmax-function |
| keywords[2].score | 0.8134415149688721 |
| keywords[2].display_name | Softmax function |
| keywords[3].id | https://openalex.org/keywords/computer-science |
| keywords[3].score | 0.7654459476470947 |
| keywords[3].display_name | Computer science |
| keywords[4].id | https://openalex.org/keywords/margin |
| keywords[4].score | 0.6358084678649902 |
| keywords[4].display_name | Margin (machine learning) |
| keywords[5].id | https://openalex.org/keywords/feature |
| keywords[5].score | 0.6302101016044617 |
| keywords[5].display_name | Feature (linguistics) |
| keywords[6].id | https://openalex.org/keywords/head |
| keywords[6].score | 0.507874071598053 |
| keywords[6].display_name | Head (geology) |
| keywords[7].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[7].score | 0.49939537048339844 |
| keywords[7].display_name | Artificial intelligence |
| keywords[8].id | https://openalex.org/keywords/pattern-recognition |
| keywords[8].score | 0.4431264102458954 |
| keywords[8].display_name | Pattern recognition (psychology) |
| keywords[9].id | https://openalex.org/keywords/speech-recognition |
| keywords[9].score | 0.4028703570365906 |
| keywords[9].display_name | Speech recognition |
| keywords[10].id | https://openalex.org/keywords/machine-learning |
| keywords[10].score | 0.3880019187927246 |
| keywords[10].display_name | Machine learning |
| keywords[11].id | https://openalex.org/keywords/natural-language-processing |
| keywords[11].score | 0.3274744153022766 |
| keywords[11].display_name | Natural language processing |
| keywords[12].id | https://openalex.org/keywords/convolutional-neural-network |
| keywords[12].score | 0.13095435500144958 |
| keywords[12].display_name | Convolutional neural network |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2110.05042 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2110.05042 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2110.05042 |
| locations[1].id | doi:10.48550/arxiv.2110.05042 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2110.05042 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5101551259 |
| authorships[0].author.orcid | https://orcid.org/0000-0001-8590-0654 |
| authorships[0].author.display_name | Miao Zhao |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Zhao, Miao |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5101360228 |
| authorships[1].author.orcid | |
| authorships[1].author.display_name | Yufeng Ma |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Ma, Yufeng |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5104022548 |
| authorships[2].author.orcid | https://orcid.org/0000-0002-4076-8546 |
| authorships[2].author.display_name | Yiwei Ding |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Ding, Yiwei |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5107092573 |
| authorships[3].author.orcid | https://orcid.org/0000-0003-0757-4210 |
| authorships[3].author.display_name | Yu Zheng |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Zheng, Yu |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5100343919 |
| authorships[4].author.orcid | https://orcid.org/0000-0002-8902-5460 |
| authorships[4].author.display_name | Min Liu |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Liu, Min |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A5100413867 |
| authorships[5].author.orcid | https://orcid.org/0000-0003-3625-1736 |
| authorships[5].author.display_name | Minqiang Xu |
| authorships[5].author_position | last |
| authorships[5].raw_author_name | Xu, Minqiang |
| authorships[5].is_corresponding | False |
| has_content.pdf | True |
| has_content.grobid_xml | True |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2110.05042 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2022-10-09T00:00:00 |
| display_name | Multi-query multi-head attention pooling and Inter-topK penalty for speaker verification |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10201 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9994999766349792 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1702 |
| primary_topic.subfield.display_name | Artificial Intelligence |
| primary_topic.display_name | Speech Recognition and Synthesis |
| related_works | https://openalex.org/W3095152779, https://openalex.org/W3119773509, https://openalex.org/W3128220219, https://openalex.org/W2982889384, https://openalex.org/W4226227567, https://openalex.org/W3134502938, https://openalex.org/W2971218105, https://openalex.org/W3006353185, https://openalex.org/W3010284783, https://openalex.org/W4287113729 |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2110.05042 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2110.05042 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2110.05042 |
| primary_location.id | pmh:oai:arXiv.org:2110.05042 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2110.05042 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2110.05042 |
| publication_date | 2021-10-11 |
| publication_year | 2021 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 87 |
| abstract_inverted_index.By | 99 |
| abstract_inverted_index.To | 79 |
| abstract_inverted_index.an | 91 |
| abstract_inverted_index.in | 17, 111 |
| abstract_inverted_index.of | 49, 113 |
| abstract_inverted_index.on | 95 |
| abstract_inverted_index.or | 43 |
| abstract_inverted_index.to | 36, 45, 74 |
| abstract_inverted_index.we | 85, 107 |
| abstract_inverted_index.Our | 53 |
| abstract_inverted_index.The | 66 |
| abstract_inverted_index.all | 112 |
| abstract_inverted_index.and | 9, 61, 104 |
| abstract_inverted_index.are | 71 |
| abstract_inverted_index.for | 22 |
| abstract_inverted_index.our | 18 |
| abstract_inverted_index.the | 3, 37, 50, 82, 102, 114 |
| abstract_inverted_index.two | 59 |
| abstract_inverted_index.Most | 29 |
| abstract_inverted_index.This | 0 |
| abstract_inverted_index.adds | 90 |
| abstract_inverted_index.both | 57, 101 |
| abstract_inverted_index.gain | 62 |
| abstract_inverted_index.loss | 69 |
| abstract_inverted_index.more | 63 |
| abstract_inverted_index.some | 96 |
| abstract_inverted_index.test | 117 |
| abstract_inverted_index.that | 89 |
| abstract_inverted_index.were | 14 |
| abstract_inverted_index.2021. | 28 |
| abstract_inverted_index.MQMHA | 55, 103 |
| abstract_inverted_index.extra | 92 |
| abstract_inverted_index.first | 15 |
| abstract_inverted_index.heads | 42 |
| abstract_inverted_index.paper | 1 |
| abstract_inverted_index.parts | 48 |
| abstract_inverted_index.sets. | 118 |
| abstract_inverted_index.split | 47 |
| abstract_inverted_index.these | 58 |
| abstract_inverted_index.which | 13 |
| abstract_inverted_index.whole | 38, 51 |
| abstract_inverted_index.attend | 35, 44 |
| abstract_inverted_index.either | 34 |
| abstract_inverted_index.method | 88 |
| abstract_inverted_index.obtain | 75 |
| abstract_inverted_index.public | 115 |
| abstract_inverted_index.system | 20 |
| abstract_inverted_index.(MQMHA) | 7 |
| abstract_inverted_index.adopted | 73 |
| abstract_inverted_index.enhance | 81 |
| abstract_inverted_index.feature | 39 |
| abstract_inverted_index.further | 80 |
| abstract_inverted_index.methods | 12 |
| abstract_inverted_index.penalty | 11, 94 |
| abstract_inverted_index.pooling | 8, 32 |
| abstract_inverted_index.propose | 86 |
| abstract_inverted_index.several | 46 |
| abstract_inverted_index.softmax | 68 |
| abstract_inverted_index.speaker | 24, 77 |
| abstract_inverted_index.through | 40 |
| abstract_inverted_index.(VoxSRC) | 27 |
| abstract_inverted_index.VoxCeleb | 23, 116 |
| abstract_inverted_index.achieved | 108 |
| abstract_inverted_index.adopting | 100 |
| abstract_inverted_index.combines | 56 |
| abstract_inverted_index.commonly | 72 |
| abstract_inverted_index.confused | 97 |
| abstract_inverted_index.feature. | 52 |
| abstract_inverted_index.multiple | 41 |
| abstract_inverted_index.penalty, | 106 |
| abstract_inverted_index.proposed | 16, 54 |
| abstract_inverted_index.attention | 6, 31 |
| abstract_inverted_index.challenge | 26 |
| abstract_inverted_index.describes | 2 |
| abstract_inverted_index.functions | 70 |
| abstract_inverted_index.speakers. | 98 |
| abstract_inverted_index.submitted | 19 |
| abstract_inverted_index.inter-topK | 10, 93, 105 |
| abstract_inverted_index.mechanisms | 33, 60 |
| abstract_inverted_index.multi-head | 5, 30 |
| abstract_inverted_index.description | 21 |
| abstract_inverted_index.diversified | 64 |
| abstract_inverted_index.inter-class | 83 |
| abstract_inverted_index.multi-query | 4 |
| abstract_inverted_index.performance | 110 |
| abstract_inverted_index.recognition | 25 |
| abstract_inverted_index.information. | 65 |
| abstract_inverted_index.margin-based | 67 |
| abstract_inverted_index.discriminative | 76 |
| abstract_inverted_index.representations. | 78 |
| abstract_inverted_index.state-of-the-art | 109 |
| abstract_inverted_index.discriminability, | 84 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 6 |
| sustainable_development_goals[0].id | https://metadata.un.org/sdg/10 |
| sustainable_development_goals[0].score | 0.6700000166893005 |
| sustainable_development_goals[0].display_name | Reduced inequalities |
| citation_normalized_percentile |