V2S attack: building DNN-based voice conversion from automatic speaker verification Article Swipe
YOU?
·
· 2019
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.1908.01454
This paper presents a new voice impersonation attack using voice conversion (VC). Enrolling personal voices for automatic speaker verification (ASV) offers natural and flexible biometric authentication systems. Basically, the ASV systems do not include the users' voice data. However, if the ASV system is unexpectedly exposed and hacked by a malicious attacker, there is a risk that the attacker will use VC techniques to reproduce the enrolled user's voices. We name this the ``verification-to-synthesis (V2S) attack'' and propose VC training with the ASV and pre-trained automatic speech recognition (ASR) models and without the targeted speaker's voice data. The VC model reproduces the targeted speaker's individuality by deceiving the ASV model and restores phonetic property of an input voice by matching phonetic posteriorgrams predicted by the ASR model. The experimental evaluation compares converted voices between the proposed method that does not use the targeted speaker's voice data and the standard VC that uses the data. The experimental results demonstrate that the proposed method performs comparably to the existing VC methods that trained using a very small amount of parallel voice data.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/1908.01454
- https://arxiv.org/pdf/1908.01454
- OA Status
- green
- Cited By
- 3
- References
- 3
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W2965290553
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W2965290553Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.1908.01454Digital Object Identifier
- Title
-
V2S attack: building DNN-based voice conversion from automatic speaker verificationWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2019Year of publication
- Publication date
-
2019-08-05Full publication date if available
- Authors
-
Taiki Nakamura, Yuki Saito, Shinnosuke Takamichi, Yusuke Ijima, Hiroshi SaruwatariList of authors in order
- Landing page
-
https://arxiv.org/abs/1908.01454Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/1908.01454Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/1908.01454Direct OA link when available
- Concepts
-
Computer science, Speech recognition, Speaker recognition, Speaker verification, Authentication (law), Biometrics, Matching (statistics), Speaker diarisation, Artificial intelligence, Computer security, Statistics, MathematicsTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
3Total citation count in OpenAlex
- Citations by year (recent)
-
2021: 2, 2019: 1Per-year citation counts (last 5 years)
- References (count)
-
3Number of works referenced by this work
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W2965290553 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.1908.01454 |
| ids.doi | https://doi.org/10.48550/arxiv.1908.01454 |
| ids.mag | 2965290553 |
| ids.openalex | https://openalex.org/W2965290553 |
| fwci | |
| type | preprint |
| title | V2S attack: building DNN-based voice conversion from automatic speaker verification |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10201 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9998999834060669 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1702 |
| topics[0].subfield.display_name | Artificial Intelligence |
| topics[0].display_name | Speech Recognition and Synthesis |
| topics[1].id | https://openalex.org/T10860 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9952999949455261 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1711 |
| topics[1].subfield.display_name | Signal Processing |
| topics[1].display_name | Speech and Audio Processing |
| topics[2].id | https://openalex.org/T11309 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.9952999949455261 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1711 |
| topics[2].subfield.display_name | Signal Processing |
| topics[2].display_name | Music and Audio Processing |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C41008148 |
| concepts[0].level | 0 |
| concepts[0].score | 0.7736296653747559 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[0].display_name | Computer science |
| concepts[1].id | https://openalex.org/C28490314 |
| concepts[1].level | 1 |
| concepts[1].score | 0.7268245220184326 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q189436 |
| concepts[1].display_name | Speech recognition |
| concepts[2].id | https://openalex.org/C133892786 |
| concepts[2].level | 2 |
| concepts[2].score | 0.6847974061965942 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q1145189 |
| concepts[2].display_name | Speaker recognition |
| concepts[3].id | https://openalex.org/C2982762665 |
| concepts[3].level | 3 |
| concepts[3].score | 0.6243856549263 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q1145189 |
| concepts[3].display_name | Speaker verification |
| concepts[4].id | https://openalex.org/C148417208 |
| concepts[4].level | 2 |
| concepts[4].score | 0.5373432040214539 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q4825882 |
| concepts[4].display_name | Authentication (law) |
| concepts[5].id | https://openalex.org/C184297639 |
| concepts[5].level | 2 |
| concepts[5].score | 0.5310314893722534 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q177765 |
| concepts[5].display_name | Biometrics |
| concepts[6].id | https://openalex.org/C165064840 |
| concepts[6].level | 2 |
| concepts[6].score | 0.4993007183074951 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q1321061 |
| concepts[6].display_name | Matching (statistics) |
| concepts[7].id | https://openalex.org/C149838564 |
| concepts[7].level | 3 |
| concepts[7].score | 0.4395439028739929 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q7574248 |
| concepts[7].display_name | Speaker diarisation |
| concepts[8].id | https://openalex.org/C154945302 |
| concepts[8].level | 1 |
| concepts[8].score | 0.25407636165618896 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[8].display_name | Artificial intelligence |
| concepts[9].id | https://openalex.org/C38652104 |
| concepts[9].level | 1 |
| concepts[9].score | 0.15682336688041687 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q3510521 |
| concepts[9].display_name | Computer security |
| concepts[10].id | https://openalex.org/C105795698 |
| concepts[10].level | 1 |
| concepts[10].score | 0.0 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q12483 |
| concepts[10].display_name | Statistics |
| concepts[11].id | https://openalex.org/C33923547 |
| concepts[11].level | 0 |
| concepts[11].score | 0.0 |
| concepts[11].wikidata | https://www.wikidata.org/wiki/Q395 |
| concepts[11].display_name | Mathematics |
| keywords[0].id | https://openalex.org/keywords/computer-science |
| keywords[0].score | 0.7736296653747559 |
| keywords[0].display_name | Computer science |
| keywords[1].id | https://openalex.org/keywords/speech-recognition |
| keywords[1].score | 0.7268245220184326 |
| keywords[1].display_name | Speech recognition |
| keywords[2].id | https://openalex.org/keywords/speaker-recognition |
| keywords[2].score | 0.6847974061965942 |
| keywords[2].display_name | Speaker recognition |
| keywords[3].id | https://openalex.org/keywords/speaker-verification |
| keywords[3].score | 0.6243856549263 |
| keywords[3].display_name | Speaker verification |
| keywords[4].id | https://openalex.org/keywords/authentication |
| keywords[4].score | 0.5373432040214539 |
| keywords[4].display_name | Authentication (law) |
| keywords[5].id | https://openalex.org/keywords/biometrics |
| keywords[5].score | 0.5310314893722534 |
| keywords[5].display_name | Biometrics |
| keywords[6].id | https://openalex.org/keywords/matching |
| keywords[6].score | 0.4993007183074951 |
| keywords[6].display_name | Matching (statistics) |
| keywords[7].id | https://openalex.org/keywords/speaker-diarisation |
| keywords[7].score | 0.4395439028739929 |
| keywords[7].display_name | Speaker diarisation |
| keywords[8].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[8].score | 0.25407636165618896 |
| keywords[8].display_name | Artificial intelligence |
| keywords[9].id | https://openalex.org/keywords/computer-security |
| keywords[9].score | 0.15682336688041687 |
| keywords[9].display_name | Computer security |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:1908.01454 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | public-domain |
| locations[0].pdf_url | https://arxiv.org/pdf/1908.01454 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | https://openalex.org/licenses/public-domain |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/1908.01454 |
| locations[1].id | doi:10.48550/arxiv.1908.01454 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.1908.01454 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5104139092 |
| authorships[0].author.orcid | |
| authorships[0].author.display_name | Taiki Nakamura |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Taiki Nakamura |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5083394213 |
| authorships[1].author.orcid | https://orcid.org/0000-0002-7967-2613 |
| authorships[1].author.display_name | Yuki Saito |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Yuki Saito |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5013050263 |
| authorships[2].author.orcid | https://orcid.org/0000-0003-0520-7847 |
| authorships[2].author.display_name | Shinnosuke Takamichi |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Shinnosuke Takamichi |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5068604686 |
| authorships[3].author.orcid | |
| authorships[3].author.display_name | Yusuke Ijima |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Yusuke Ijima |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5003814223 |
| authorships[4].author.orcid | https://orcid.org/0000-0003-0876-5617 |
| authorships[4].author.display_name | Hiroshi Saruwatari |
| authorships[4].author_position | last |
| authorships[4].raw_author_name | Hiroshi Saruwatari |
| authorships[4].is_corresponding | False |
| has_content.pdf | True |
| has_content.grobid_xml | True |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/1908.01454 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2019-08-13T00:00:00 |
| display_name | V2S attack: building DNN-based voice conversion from automatic speaker verification |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10201 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9998999834060669 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1702 |
| primary_topic.subfield.display_name | Artificial Intelligence |
| primary_topic.display_name | Speech Recognition and Synthesis |
| related_works | https://openalex.org/W2206035908, https://openalex.org/W66821593, https://openalex.org/W3104966193, https://openalex.org/W1521299571, https://openalex.org/W2162158162, https://openalex.org/W4235705411, https://openalex.org/W4247736853, https://openalex.org/W1493012537, https://openalex.org/W1999004162, https://openalex.org/W2144470400 |
| cited_by_count | 3 |
| counts_by_year[0].year | 2021 |
| counts_by_year[0].cited_by_count | 2 |
| counts_by_year[1].year | 2019 |
| counts_by_year[1].cited_by_count | 1 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:1908.01454 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | public-domain |
| best_oa_location.pdf_url | https://arxiv.org/pdf/1908.01454 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | https://openalex.org/licenses/public-domain |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/1908.01454 |
| primary_location.id | pmh:oai:arXiv.org:1908.01454 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | public-domain |
| primary_location.pdf_url | https://arxiv.org/pdf/1908.01454 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | https://openalex.org/licenses/public-domain |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/1908.01454 |
| publication_date | 2019-08-05 |
| publication_year | 2019 |
| referenced_works | https://openalex.org/W2963971656, https://openalex.org/W2046056978, https://openalex.org/W2518172956 |
| referenced_works_count | 3 |
| abstract_inverted_index.a | 3, 49, 54, 172 |
| abstract_inverted_index.VC | 61, 78, 98, 149, 167 |
| abstract_inverted_index.We | 69 |
| abstract_inverted_index.an | 115 |
| abstract_inverted_index.by | 48, 105, 118, 123 |
| abstract_inverted_index.do | 31 |
| abstract_inverted_index.if | 39 |
| abstract_inverted_index.is | 43, 53 |
| abstract_inverted_index.of | 114, 176 |
| abstract_inverted_index.to | 63, 164 |
| abstract_inverted_index.ASR | 125 |
| abstract_inverted_index.ASV | 29, 41, 82, 108 |
| abstract_inverted_index.The | 97, 127, 154 |
| abstract_inverted_index.and | 22, 46, 76, 83, 90, 110, 146 |
| abstract_inverted_index.for | 15 |
| abstract_inverted_index.new | 4 |
| abstract_inverted_index.not | 32, 139 |
| abstract_inverted_index.the | 28, 34, 40, 57, 65, 72, 81, 92, 101, 107, 124, 134, 141, 147, 152, 159, 165 |
| abstract_inverted_index.use | 60, 140 |
| abstract_inverted_index.This | 0 |
| abstract_inverted_index.data | 145 |
| abstract_inverted_index.does | 138 |
| abstract_inverted_index.name | 70 |
| abstract_inverted_index.risk | 55 |
| abstract_inverted_index.that | 56, 137, 150, 158, 169 |
| abstract_inverted_index.this | 71 |
| abstract_inverted_index.uses | 151 |
| abstract_inverted_index.very | 173 |
| abstract_inverted_index.will | 59 |
| abstract_inverted_index.with | 80 |
| abstract_inverted_index.(ASR) | 88 |
| abstract_inverted_index.(ASV) | 19 |
| abstract_inverted_index.(V2S) | 74 |
| abstract_inverted_index.(VC). | 11 |
| abstract_inverted_index.data. | 37, 96, 153, 179 |
| abstract_inverted_index.input | 116 |
| abstract_inverted_index.model | 99, 109 |
| abstract_inverted_index.paper | 1 |
| abstract_inverted_index.small | 174 |
| abstract_inverted_index.there | 52 |
| abstract_inverted_index.using | 8, 171 |
| abstract_inverted_index.voice | 5, 9, 36, 95, 117, 144, 178 |
| abstract_inverted_index.amount | 175 |
| abstract_inverted_index.attack | 7 |
| abstract_inverted_index.hacked | 47 |
| abstract_inverted_index.method | 136, 161 |
| abstract_inverted_index.model. | 126 |
| abstract_inverted_index.models | 89 |
| abstract_inverted_index.offers | 20 |
| abstract_inverted_index.speech | 86 |
| abstract_inverted_index.system | 42 |
| abstract_inverted_index.user's | 67 |
| abstract_inverted_index.users' | 35 |
| abstract_inverted_index.voices | 14, 132 |
| abstract_inverted_index.between | 133 |
| abstract_inverted_index.exposed | 45 |
| abstract_inverted_index.include | 33 |
| abstract_inverted_index.methods | 168 |
| abstract_inverted_index.natural | 21 |
| abstract_inverted_index.propose | 77 |
| abstract_inverted_index.results | 156 |
| abstract_inverted_index.speaker | 17 |
| abstract_inverted_index.systems | 30 |
| abstract_inverted_index.trained | 170 |
| abstract_inverted_index.voices. | 68 |
| abstract_inverted_index.without | 91 |
| abstract_inverted_index.However, | 38 |
| abstract_inverted_index.attack'' | 75 |
| abstract_inverted_index.attacker | 58 |
| abstract_inverted_index.compares | 130 |
| abstract_inverted_index.enrolled | 66 |
| abstract_inverted_index.existing | 166 |
| abstract_inverted_index.flexible | 23 |
| abstract_inverted_index.matching | 119 |
| abstract_inverted_index.parallel | 177 |
| abstract_inverted_index.performs | 162 |
| abstract_inverted_index.personal | 13 |
| abstract_inverted_index.phonetic | 112, 120 |
| abstract_inverted_index.presents | 2 |
| abstract_inverted_index.property | 113 |
| abstract_inverted_index.proposed | 135, 160 |
| abstract_inverted_index.restores | 111 |
| abstract_inverted_index.standard | 148 |
| abstract_inverted_index.systems. | 26 |
| abstract_inverted_index.targeted | 93, 102, 142 |
| abstract_inverted_index.training | 79 |
| abstract_inverted_index.Enrolling | 12 |
| abstract_inverted_index.attacker, | 51 |
| abstract_inverted_index.automatic | 16, 85 |
| abstract_inverted_index.biometric | 24 |
| abstract_inverted_index.converted | 131 |
| abstract_inverted_index.deceiving | 106 |
| abstract_inverted_index.malicious | 50 |
| abstract_inverted_index.predicted | 122 |
| abstract_inverted_index.reproduce | 64 |
| abstract_inverted_index.speaker's | 94, 103, 143 |
| abstract_inverted_index.Basically, | 27 |
| abstract_inverted_index.comparably | 163 |
| abstract_inverted_index.conversion | 10 |
| abstract_inverted_index.evaluation | 129 |
| abstract_inverted_index.reproduces | 100 |
| abstract_inverted_index.techniques | 62 |
| abstract_inverted_index.demonstrate | 157 |
| abstract_inverted_index.pre-trained | 84 |
| abstract_inverted_index.recognition | 87 |
| abstract_inverted_index.experimental | 128, 155 |
| abstract_inverted_index.unexpectedly | 44 |
| abstract_inverted_index.verification | 18 |
| abstract_inverted_index.impersonation | 6 |
| abstract_inverted_index.individuality | 104 |
| abstract_inverted_index.authentication | 25 |
| abstract_inverted_index.posteriorgrams | 121 |
| abstract_inverted_index.``verification-to-synthesis | 73 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 5 |
| sustainable_development_goals[0].id | https://metadata.un.org/sdg/16 |
| sustainable_development_goals[0].score | 0.4099999964237213 |
| sustainable_development_goals[0].display_name | Peace, Justice and strong institutions |
| citation_normalized_percentile |