Poformer: A simple pooling transformer for speaker verification Article Swipe
YOU?
·
· 2021
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2110.04692
Most recent speaker verification systems are based on extracting speaker embeddings using a deep neural network. The pooling layer in the network aims to aggregate frame-level features extracted by the backbone. In this paper, we propose a new transformer based pooling structure called PoFormer to enhance the ability of the pooling layer to capture information along the whole time axis. Different from previous works that apply attention mechanism in a simple way or implement the multi-head mechanism in serial instead of in parallel, PoFormer follows the initial transformer structure with some minor modifications like a positional encoding generator, drop path and LayerScale to make the training procedure more stable and to prevent overfitting. Evaluated on various datasets, PoFormer outperforms the existing pooling system with at least a 13.00% improvement in EER and a 9.12% improvement in minDCF.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2110.04692
- https://arxiv.org/pdf/2110.04692
- OA Status
- green
- Cited By
- 1
- References
- 20
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W3207052039
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W3207052039Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2110.04692Digital Object Identifier
- Title
-
Poformer: A simple pooling transformer for speaker verificationWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2021Year of publication
- Publication date
-
2021-10-10Full publication date if available
- Authors
-
Yufeng Ma, Yiwei Ding, Miao Zhao, Yu Zheng, Min Liu, Minqiang XuList of authors in order
- Landing page
-
https://arxiv.org/abs/2110.04692Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2110.04692Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2110.04692Direct OA link when available
- Concepts
-
Pooling, Computer science, Overfitting, Transformer, Artificial intelligence, Speech recognition, Pattern recognition (psychology), Artificial neural network, Engineering, Voltage, Electrical engineeringTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
1Total citation count in OpenAlex
- Citations by year (recent)
-
2023: 1Per-year citation counts (last 5 years)
- References (count)
-
20Number of works referenced by this work
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W3207052039 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2110.04692 |
| ids.doi | https://doi.org/10.48550/arxiv.2110.04692 |
| ids.mag | 3207052039 |
| ids.openalex | https://openalex.org/W3207052039 |
| fwci | |
| type | preprint |
| title | Poformer: A simple pooling transformer for speaker verification |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10201 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9998999834060669 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1702 |
| topics[0].subfield.display_name | Artificial Intelligence |
| topics[0].display_name | Speech Recognition and Synthesis |
| topics[1].id | https://openalex.org/T11309 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9962000250816345 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1711 |
| topics[1].subfield.display_name | Signal Processing |
| topics[1].display_name | Music and Audio Processing |
| topics[2].id | https://openalex.org/T10860 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.9961000084877014 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1711 |
| topics[2].subfield.display_name | Signal Processing |
| topics[2].display_name | Speech and Audio Processing |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C70437156 |
| concepts[0].level | 2 |
| concepts[0].score | 0.8866064548492432 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q7228652 |
| concepts[0].display_name | Pooling |
| concepts[1].id | https://openalex.org/C41008148 |
| concepts[1].level | 0 |
| concepts[1].score | 0.7996540069580078 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[1].display_name | Computer science |
| concepts[2].id | https://openalex.org/C22019652 |
| concepts[2].level | 3 |
| concepts[2].score | 0.7804945707321167 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q331309 |
| concepts[2].display_name | Overfitting |
| concepts[3].id | https://openalex.org/C66322947 |
| concepts[3].level | 3 |
| concepts[3].score | 0.6820139288902283 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q11658 |
| concepts[3].display_name | Transformer |
| concepts[4].id | https://openalex.org/C154945302 |
| concepts[4].level | 1 |
| concepts[4].score | 0.5003502368927002 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[4].display_name | Artificial intelligence |
| concepts[5].id | https://openalex.org/C28490314 |
| concepts[5].level | 1 |
| concepts[5].score | 0.3949087858200073 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q189436 |
| concepts[5].display_name | Speech recognition |
| concepts[6].id | https://openalex.org/C153180895 |
| concepts[6].level | 2 |
| concepts[6].score | 0.37835726141929626 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q7148389 |
| concepts[6].display_name | Pattern recognition (psychology) |
| concepts[7].id | https://openalex.org/C50644808 |
| concepts[7].level | 2 |
| concepts[7].score | 0.3741511404514313 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q192776 |
| concepts[7].display_name | Artificial neural network |
| concepts[8].id | https://openalex.org/C127413603 |
| concepts[8].level | 0 |
| concepts[8].score | 0.08184614777565002 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q11023 |
| concepts[8].display_name | Engineering |
| concepts[9].id | https://openalex.org/C165801399 |
| concepts[9].level | 2 |
| concepts[9].score | 0.06680282950401306 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q25428 |
| concepts[9].display_name | Voltage |
| concepts[10].id | https://openalex.org/C119599485 |
| concepts[10].level | 1 |
| concepts[10].score | 0.0 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q43035 |
| concepts[10].display_name | Electrical engineering |
| keywords[0].id | https://openalex.org/keywords/pooling |
| keywords[0].score | 0.8866064548492432 |
| keywords[0].display_name | Pooling |
| keywords[1].id | https://openalex.org/keywords/computer-science |
| keywords[1].score | 0.7996540069580078 |
| keywords[1].display_name | Computer science |
| keywords[2].id | https://openalex.org/keywords/overfitting |
| keywords[2].score | 0.7804945707321167 |
| keywords[2].display_name | Overfitting |
| keywords[3].id | https://openalex.org/keywords/transformer |
| keywords[3].score | 0.6820139288902283 |
| keywords[3].display_name | Transformer |
| keywords[4].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[4].score | 0.5003502368927002 |
| keywords[4].display_name | Artificial intelligence |
| keywords[5].id | https://openalex.org/keywords/speech-recognition |
| keywords[5].score | 0.3949087858200073 |
| keywords[5].display_name | Speech recognition |
| keywords[6].id | https://openalex.org/keywords/pattern-recognition |
| keywords[6].score | 0.37835726141929626 |
| keywords[6].display_name | Pattern recognition (psychology) |
| keywords[7].id | https://openalex.org/keywords/artificial-neural-network |
| keywords[7].score | 0.3741511404514313 |
| keywords[7].display_name | Artificial neural network |
| keywords[8].id | https://openalex.org/keywords/engineering |
| keywords[8].score | 0.08184614777565002 |
| keywords[8].display_name | Engineering |
| keywords[9].id | https://openalex.org/keywords/voltage |
| keywords[9].score | 0.06680282950401306 |
| keywords[9].display_name | Voltage |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2110.04692 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2110.04692 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2110.04692 |
| locations[1].id | doi:10.48550/arxiv.2110.04692 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2110.04692 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5024989442 |
| authorships[0].author.orcid | https://orcid.org/0000-0003-2095-5739 |
| authorships[0].author.display_name | Yufeng Ma |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Yufeng Ma |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5000095400 |
| authorships[1].author.orcid | https://orcid.org/0000-0002-8156-3715 |
| authorships[1].author.display_name | Yiwei Ding |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Yiwei Ding |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5074355807 |
| authorships[2].author.orcid | https://orcid.org/0000-0002-4324-1467 |
| authorships[2].author.display_name | Miao Zhao |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Miao Zhao |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5107092573 |
| authorships[3].author.orcid | https://orcid.org/0000-0003-0757-4210 |
| authorships[3].author.display_name | Yu Zheng |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Yu Zheng |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5100343919 |
| authorships[4].author.orcid | https://orcid.org/0000-0002-8902-5460 |
| authorships[4].author.display_name | Min Liu |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Min Liu |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A5100413867 |
| authorships[5].author.orcid | https://orcid.org/0000-0003-3625-1736 |
| authorships[5].author.display_name | Minqiang Xu |
| authorships[5].author_position | last |
| authorships[5].raw_author_name | Minqiang Xu |
| authorships[5].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2110.04692 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Poformer: A simple pooling transformer for speaker verification |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10201 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9998999834060669 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1702 |
| primary_topic.subfield.display_name | Artificial Intelligence |
| primary_topic.display_name | Speech Recognition and Synthesis |
| related_works | https://openalex.org/W4362597605, https://openalex.org/W1574414179, https://openalex.org/W3009056573, https://openalex.org/W2922073769, https://openalex.org/W4297676672, https://openalex.org/W4281702477, https://openalex.org/W4378510483, https://openalex.org/W2490526372, https://openalex.org/W3026913501, https://openalex.org/W4318954401 |
| cited_by_count | 1 |
| counts_by_year[0].year | 2023 |
| counts_by_year[0].cited_by_count | 1 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2110.04692 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2110.04692 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2110.04692 |
| primary_location.id | pmh:oai:arXiv.org:2110.04692 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2110.04692 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2110.04692 |
| publication_date | 2021-10-10 |
| publication_year | 2021 |
| referenced_works | https://openalex.org/W3198698812, https://openalex.org/W2117671523, https://openalex.org/W2794506738, https://openalex.org/W2808631503, https://openalex.org/W2890964092, https://openalex.org/W2963975324, https://openalex.org/W2748488820, https://openalex.org/W3119786062, https://openalex.org/W3170841864, https://openalex.org/W3024869864, https://openalex.org/W2963341956, https://openalex.org/W3030520226, https://openalex.org/W3202088435, https://openalex.org/W3034772996, https://openalex.org/W2150769028, https://openalex.org/W2726515241, https://openalex.org/W3139049060, https://openalex.org/W2889519245, https://openalex.org/W2963403868, https://openalex.org/W3103152812 |
| referenced_works_count | 20 |
| abstract_inverted_index.a | 12, 36, 69, 94, 126, 132 |
| abstract_inverted_index.In | 31 |
| abstract_inverted_index.at | 124 |
| abstract_inverted_index.by | 28 |
| abstract_inverted_index.in | 19, 68, 77, 81, 129, 135 |
| abstract_inverted_index.of | 48, 80 |
| abstract_inverted_index.on | 7, 114 |
| abstract_inverted_index.or | 72 |
| abstract_inverted_index.to | 23, 44, 52, 102, 110 |
| abstract_inverted_index.we | 34 |
| abstract_inverted_index.EER | 130 |
| abstract_inverted_index.The | 16 |
| abstract_inverted_index.and | 100, 109, 131 |
| abstract_inverted_index.are | 5 |
| abstract_inverted_index.new | 37 |
| abstract_inverted_index.the | 20, 29, 46, 49, 56, 74, 85, 104, 119 |
| abstract_inverted_index.way | 71 |
| abstract_inverted_index.Most | 0 |
| abstract_inverted_index.aims | 22 |
| abstract_inverted_index.deep | 13 |
| abstract_inverted_index.drop | 98 |
| abstract_inverted_index.from | 61 |
| abstract_inverted_index.like | 93 |
| abstract_inverted_index.make | 103 |
| abstract_inverted_index.more | 107 |
| abstract_inverted_index.path | 99 |
| abstract_inverted_index.some | 90 |
| abstract_inverted_index.that | 64 |
| abstract_inverted_index.this | 32 |
| abstract_inverted_index.time | 58 |
| abstract_inverted_index.with | 89, 123 |
| abstract_inverted_index.9.12% | 133 |
| abstract_inverted_index.along | 55 |
| abstract_inverted_index.apply | 65 |
| abstract_inverted_index.axis. | 59 |
| abstract_inverted_index.based | 6, 39 |
| abstract_inverted_index.layer | 18, 51 |
| abstract_inverted_index.least | 125 |
| abstract_inverted_index.minor | 91 |
| abstract_inverted_index.using | 11 |
| abstract_inverted_index.whole | 57 |
| abstract_inverted_index.works | 63 |
| abstract_inverted_index.13.00% | 127 |
| abstract_inverted_index.called | 42 |
| abstract_inverted_index.neural | 14 |
| abstract_inverted_index.paper, | 33 |
| abstract_inverted_index.recent | 1 |
| abstract_inverted_index.serial | 78 |
| abstract_inverted_index.simple | 70 |
| abstract_inverted_index.stable | 108 |
| abstract_inverted_index.system | 122 |
| abstract_inverted_index.ability | 47 |
| abstract_inverted_index.capture | 53 |
| abstract_inverted_index.enhance | 45 |
| abstract_inverted_index.follows | 84 |
| abstract_inverted_index.initial | 86 |
| abstract_inverted_index.instead | 79 |
| abstract_inverted_index.minDCF. | 136 |
| abstract_inverted_index.network | 21 |
| abstract_inverted_index.pooling | 17, 40, 50, 121 |
| abstract_inverted_index.prevent | 111 |
| abstract_inverted_index.propose | 35 |
| abstract_inverted_index.speaker | 2, 9 |
| abstract_inverted_index.systems | 4 |
| abstract_inverted_index.various | 115 |
| abstract_inverted_index.PoFormer | 43, 83, 117 |
| abstract_inverted_index.encoding | 96 |
| abstract_inverted_index.existing | 120 |
| abstract_inverted_index.features | 26 |
| abstract_inverted_index.network. | 15 |
| abstract_inverted_index.previous | 62 |
| abstract_inverted_index.training | 105 |
| abstract_inverted_index.Different | 60 |
| abstract_inverted_index.Evaluated | 113 |
| abstract_inverted_index.aggregate | 24 |
| abstract_inverted_index.attention | 66 |
| abstract_inverted_index.backbone. | 30 |
| abstract_inverted_index.datasets, | 116 |
| abstract_inverted_index.extracted | 27 |
| abstract_inverted_index.implement | 73 |
| abstract_inverted_index.mechanism | 67, 76 |
| abstract_inverted_index.parallel, | 82 |
| abstract_inverted_index.procedure | 106 |
| abstract_inverted_index.structure | 41, 88 |
| abstract_inverted_index.LayerScale | 101 |
| abstract_inverted_index.embeddings | 10 |
| abstract_inverted_index.extracting | 8 |
| abstract_inverted_index.generator, | 97 |
| abstract_inverted_index.multi-head | 75 |
| abstract_inverted_index.positional | 95 |
| abstract_inverted_index.frame-level | 25 |
| abstract_inverted_index.improvement | 128, 134 |
| abstract_inverted_index.information | 54 |
| abstract_inverted_index.outperforms | 118 |
| abstract_inverted_index.transformer | 38, 87 |
| abstract_inverted_index.overfitting. | 112 |
| abstract_inverted_index.verification | 3 |
| abstract_inverted_index.modifications | 92 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 6 |
| citation_normalized_percentile |