A Bayesian Approach to Estimation of Speaker Normalization Parameters Article Swipe
YOU?
·
· 2016
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.1610.05948
In this work, a Bayesian approach to speaker normalization is proposed to compensate for the degradation in performance of a speaker independent speech recognition system. The speaker normalization method proposed herein uses the technique of vocal tract length normalization (VTLN). The VTLN parameters are estimated using a novel Bayesian approach which utilizes the Gibbs sampler, a special type of Markov Chain Monte Carlo method. Additionally the hyperparameters are estimated using maximum likelihood approach. This model is used assuming that human vocal tract can be modeled as a tube of uniform cross section. It captures the variation in length of the vocal tract of different speakers more effectively, than the linear model used in literature. The work has also investigated different methods like minimization of Mean Square Error (MSE) and Mean Absolute Error (MAE) for the estimation of VTLN parameters. Both single pass and two pass approaches are then used to build a VTLN based speech recognizer. Experimental results on recognition of vowels and Hindi phrases from a medium vocabulary indicate that the Bayesian method improves the performance by a considerable margin.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/1610.05948
- https://arxiv.org/pdf/1610.05948
- OA Status
- green
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W2538877340
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W2538877340Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.1610.05948Digital Object Identifier
- Title
-
A Bayesian Approach to Estimation of Speaker Normalization ParametersWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2016Year of publication
- Publication date
-
2016-10-19Full publication date if available
- Authors
-
Dhananjay Ram, Debasis Kundu, Rajesh M. HegdeList of authors in order
- Landing page
-
https://arxiv.org/abs/1610.05948Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/1610.05948Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/1610.05948Direct OA link when available
- Concepts
-
Vocal tract, Normalization (sociology), Computer science, Gibbs sampling, Bayesian probability, Markov chain Monte Carlo, Speech recognition, Hyperparameter, Pattern recognition (psychology), Artificial intelligence, Mean squared error, Mathematics, Statistics, Anthropology, SociologyTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W2538877340 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.1610.05948 |
| ids.doi | https://doi.org/10.48550/arxiv.1610.05948 |
| ids.mag | 2538877340 |
| ids.openalex | https://openalex.org/W2538877340 |
| fwci | |
| type | preprint |
| title | A Bayesian Approach to Estimation of Speaker Normalization Parameters |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10201 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9864000082015991 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1702 |
| topics[0].subfield.display_name | Artificial Intelligence |
| topics[0].display_name | Speech Recognition and Synthesis |
| topics[1].id | https://openalex.org/T10860 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9800999760627747 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1711 |
| topics[1].subfield.display_name | Signal Processing |
| topics[1].display_name | Speech and Audio Processing |
| topics[2].id | https://openalex.org/T10901 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.9243000149726868 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1707 |
| topics[2].subfield.display_name | Computer Vision and Pattern Recognition |
| topics[2].display_name | Advanced Data Compression Techniques |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C47401133 |
| concepts[0].level | 2 |
| concepts[0].score | 0.8101339936256409 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q748953 |
| concepts[0].display_name | Vocal tract |
| concepts[1].id | https://openalex.org/C136886441 |
| concepts[1].level | 2 |
| concepts[1].score | 0.7095991969108582 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q926129 |
| concepts[1].display_name | Normalization (sociology) |
| concepts[2].id | https://openalex.org/C41008148 |
| concepts[2].level | 0 |
| concepts[2].score | 0.6489894390106201 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[2].display_name | Computer science |
| concepts[3].id | https://openalex.org/C158424031 |
| concepts[3].level | 3 |
| concepts[3].score | 0.5970686078071594 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q1191905 |
| concepts[3].display_name | Gibbs sampling |
| concepts[4].id | https://openalex.org/C107673813 |
| concepts[4].level | 2 |
| concepts[4].score | 0.5354936718940735 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q812534 |
| concepts[4].display_name | Bayesian probability |
| concepts[5].id | https://openalex.org/C111350023 |
| concepts[5].level | 3 |
| concepts[5].score | 0.5147719383239746 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q1191869 |
| concepts[5].display_name | Markov chain Monte Carlo |
| concepts[6].id | https://openalex.org/C28490314 |
| concepts[6].level | 1 |
| concepts[6].score | 0.511902928352356 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q189436 |
| concepts[6].display_name | Speech recognition |
| concepts[7].id | https://openalex.org/C8642999 |
| concepts[7].level | 2 |
| concepts[7].score | 0.4997570514678955 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q4171168 |
| concepts[7].display_name | Hyperparameter |
| concepts[8].id | https://openalex.org/C153180895 |
| concepts[8].level | 2 |
| concepts[8].score | 0.4694874882698059 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q7148389 |
| concepts[8].display_name | Pattern recognition (psychology) |
| concepts[9].id | https://openalex.org/C154945302 |
| concepts[9].level | 1 |
| concepts[9].score | 0.4387504458427429 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[9].display_name | Artificial intelligence |
| concepts[10].id | https://openalex.org/C139945424 |
| concepts[10].level | 2 |
| concepts[10].score | 0.43498319387435913 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q1940696 |
| concepts[10].display_name | Mean squared error |
| concepts[11].id | https://openalex.org/C33923547 |
| concepts[11].level | 0 |
| concepts[11].score | 0.27845314145088196 |
| concepts[11].wikidata | https://www.wikidata.org/wiki/Q395 |
| concepts[11].display_name | Mathematics |
| concepts[12].id | https://openalex.org/C105795698 |
| concepts[12].level | 1 |
| concepts[12].score | 0.20156323909759521 |
| concepts[12].wikidata | https://www.wikidata.org/wiki/Q12483 |
| concepts[12].display_name | Statistics |
| concepts[13].id | https://openalex.org/C19165224 |
| concepts[13].level | 1 |
| concepts[13].score | 0.0 |
| concepts[13].wikidata | https://www.wikidata.org/wiki/Q23404 |
| concepts[13].display_name | Anthropology |
| concepts[14].id | https://openalex.org/C144024400 |
| concepts[14].level | 0 |
| concepts[14].score | 0.0 |
| concepts[14].wikidata | https://www.wikidata.org/wiki/Q21201 |
| concepts[14].display_name | Sociology |
| keywords[0].id | https://openalex.org/keywords/vocal-tract |
| keywords[0].score | 0.8101339936256409 |
| keywords[0].display_name | Vocal tract |
| keywords[1].id | https://openalex.org/keywords/normalization |
| keywords[1].score | 0.7095991969108582 |
| keywords[1].display_name | Normalization (sociology) |
| keywords[2].id | https://openalex.org/keywords/computer-science |
| keywords[2].score | 0.6489894390106201 |
| keywords[2].display_name | Computer science |
| keywords[3].id | https://openalex.org/keywords/gibbs-sampling |
| keywords[3].score | 0.5970686078071594 |
| keywords[3].display_name | Gibbs sampling |
| keywords[4].id | https://openalex.org/keywords/bayesian-probability |
| keywords[4].score | 0.5354936718940735 |
| keywords[4].display_name | Bayesian probability |
| keywords[5].id | https://openalex.org/keywords/markov-chain-monte-carlo |
| keywords[5].score | 0.5147719383239746 |
| keywords[5].display_name | Markov chain Monte Carlo |
| keywords[6].id | https://openalex.org/keywords/speech-recognition |
| keywords[6].score | 0.511902928352356 |
| keywords[6].display_name | Speech recognition |
| keywords[7].id | https://openalex.org/keywords/hyperparameter |
| keywords[7].score | 0.4997570514678955 |
| keywords[7].display_name | Hyperparameter |
| keywords[8].id | https://openalex.org/keywords/pattern-recognition |
| keywords[8].score | 0.4694874882698059 |
| keywords[8].display_name | Pattern recognition (psychology) |
| keywords[9].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[9].score | 0.4387504458427429 |
| keywords[9].display_name | Artificial intelligence |
| keywords[10].id | https://openalex.org/keywords/mean-squared-error |
| keywords[10].score | 0.43498319387435913 |
| keywords[10].display_name | Mean squared error |
| keywords[11].id | https://openalex.org/keywords/mathematics |
| keywords[11].score | 0.27845314145088196 |
| keywords[11].display_name | Mathematics |
| keywords[12].id | https://openalex.org/keywords/statistics |
| keywords[12].score | 0.20156323909759521 |
| keywords[12].display_name | Statistics |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:1610.05948 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/1610.05948 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/1610.05948 |
| locations[1].id | doi:10.48550/arxiv.1610.05948 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.1610.05948 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5103133511 |
| authorships[0].author.orcid | https://orcid.org/0000-0003-1822-9199 |
| authorships[0].author.display_name | Dhananjay Ram |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Dhananjay Ram |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5049715298 |
| authorships[1].author.orcid | https://orcid.org/0000-0002-9141-422X |
| authorships[1].author.display_name | Debasis Kundu |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Debasis Kundu |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5085503354 |
| authorships[2].author.orcid | https://orcid.org/0000-0002-6142-7724 |
| authorships[2].author.display_name | Rajesh M. Hegde |
| authorships[2].author_position | last |
| authorships[2].raw_author_name | Rajesh M. Hegde |
| authorships[2].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/1610.05948 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | A Bayesian Approach to Estimation of Speaker Normalization Parameters |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10201 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9864000082015991 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1702 |
| primary_topic.subfield.display_name | Artificial Intelligence |
| primary_topic.display_name | Speech Recognition and Synthesis |
| related_works | https://openalex.org/W1911592522, https://openalex.org/W2087669554, https://openalex.org/W3044757496, https://openalex.org/W3125971950, https://openalex.org/W1580681286, https://openalex.org/W2116700007, https://openalex.org/W2175355783, https://openalex.org/W2622204791, https://openalex.org/W1579866848, https://openalex.org/W2066716418 |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:1610.05948 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/1610.05948 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/1610.05948 |
| primary_location.id | pmh:oai:arXiv.org:1610.05948 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/1610.05948 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/1610.05948 |
| publication_date | 2016-10-19 |
| publication_year | 2016 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 3, 19, 46, 55, 86, 151, 166, 178 |
| abstract_inverted_index.In | 0 |
| abstract_inverted_index.It | 92 |
| abstract_inverted_index.as | 85 |
| abstract_inverted_index.be | 83 |
| abstract_inverted_index.by | 177 |
| abstract_inverted_index.in | 16, 96, 112 |
| abstract_inverted_index.is | 9, 75 |
| abstract_inverted_index.of | 18, 34, 58, 88, 98, 102, 123, 136, 160 |
| abstract_inverted_index.on | 158 |
| abstract_inverted_index.to | 6, 11, 149 |
| abstract_inverted_index.The | 25, 40, 114 |
| abstract_inverted_index.and | 128, 142, 162 |
| abstract_inverted_index.are | 43, 67, 146 |
| abstract_inverted_index.can | 82 |
| abstract_inverted_index.for | 13, 133 |
| abstract_inverted_index.has | 116 |
| abstract_inverted_index.the | 14, 32, 52, 65, 94, 99, 108, 134, 171, 175 |
| abstract_inverted_index.two | 143 |
| abstract_inverted_index.Both | 139 |
| abstract_inverted_index.Mean | 124, 129 |
| abstract_inverted_index.This | 73 |
| abstract_inverted_index.VTLN | 41, 137, 152 |
| abstract_inverted_index.also | 117 |
| abstract_inverted_index.from | 165 |
| abstract_inverted_index.like | 121 |
| abstract_inverted_index.more | 105 |
| abstract_inverted_index.pass | 141, 144 |
| abstract_inverted_index.than | 107 |
| abstract_inverted_index.that | 78, 170 |
| abstract_inverted_index.then | 147 |
| abstract_inverted_index.this | 1 |
| abstract_inverted_index.tube | 87 |
| abstract_inverted_index.type | 57 |
| abstract_inverted_index.used | 76, 111, 148 |
| abstract_inverted_index.uses | 31 |
| abstract_inverted_index.work | 115 |
| abstract_inverted_index.(MAE) | 132 |
| abstract_inverted_index.(MSE) | 127 |
| abstract_inverted_index.Carlo | 62 |
| abstract_inverted_index.Chain | 60 |
| abstract_inverted_index.Error | 126, 131 |
| abstract_inverted_index.Gibbs | 53 |
| abstract_inverted_index.Hindi | 163 |
| abstract_inverted_index.Monte | 61 |
| abstract_inverted_index.based | 153 |
| abstract_inverted_index.build | 150 |
| abstract_inverted_index.cross | 90 |
| abstract_inverted_index.human | 79 |
| abstract_inverted_index.model | 74, 110 |
| abstract_inverted_index.novel | 47 |
| abstract_inverted_index.tract | 36, 81, 101 |
| abstract_inverted_index.using | 45, 69 |
| abstract_inverted_index.vocal | 35, 80, 100 |
| abstract_inverted_index.which | 50 |
| abstract_inverted_index.work, | 2 |
| abstract_inverted_index.Markov | 59 |
| abstract_inverted_index.Square | 125 |
| abstract_inverted_index.herein | 30 |
| abstract_inverted_index.length | 37, 97 |
| abstract_inverted_index.linear | 109 |
| abstract_inverted_index.medium | 167 |
| abstract_inverted_index.method | 28, 173 |
| abstract_inverted_index.single | 140 |
| abstract_inverted_index.speech | 22, 154 |
| abstract_inverted_index.vowels | 161 |
| abstract_inverted_index.(VTLN). | 39 |
| abstract_inverted_index.margin. | 180 |
| abstract_inverted_index.maximum | 70 |
| abstract_inverted_index.method. | 63 |
| abstract_inverted_index.methods | 120 |
| abstract_inverted_index.modeled | 84 |
| abstract_inverted_index.phrases | 164 |
| abstract_inverted_index.results | 157 |
| abstract_inverted_index.speaker | 7, 20, 26 |
| abstract_inverted_index.special | 56 |
| abstract_inverted_index.system. | 24 |
| abstract_inverted_index.uniform | 89 |
| abstract_inverted_index.Absolute | 130 |
| abstract_inverted_index.Bayesian | 4, 48, 172 |
| abstract_inverted_index.approach | 5, 49 |
| abstract_inverted_index.assuming | 77 |
| abstract_inverted_index.captures | 93 |
| abstract_inverted_index.improves | 174 |
| abstract_inverted_index.indicate | 169 |
| abstract_inverted_index.proposed | 10, 29 |
| abstract_inverted_index.sampler, | 54 |
| abstract_inverted_index.section. | 91 |
| abstract_inverted_index.speakers | 104 |
| abstract_inverted_index.utilizes | 51 |
| abstract_inverted_index.approach. | 72 |
| abstract_inverted_index.different | 103, 119 |
| abstract_inverted_index.estimated | 44, 68 |
| abstract_inverted_index.technique | 33 |
| abstract_inverted_index.variation | 95 |
| abstract_inverted_index.approaches | 145 |
| abstract_inverted_index.compensate | 12 |
| abstract_inverted_index.estimation | 135 |
| abstract_inverted_index.likelihood | 71 |
| abstract_inverted_index.parameters | 42 |
| abstract_inverted_index.vocabulary | 168 |
| abstract_inverted_index.degradation | 15 |
| abstract_inverted_index.independent | 21 |
| abstract_inverted_index.literature. | 113 |
| abstract_inverted_index.parameters. | 138 |
| abstract_inverted_index.performance | 17, 176 |
| abstract_inverted_index.recognition | 23, 159 |
| abstract_inverted_index.recognizer. | 155 |
| abstract_inverted_index.Additionally | 64 |
| abstract_inverted_index.Experimental | 156 |
| abstract_inverted_index.considerable | 179 |
| abstract_inverted_index.effectively, | 106 |
| abstract_inverted_index.investigated | 118 |
| abstract_inverted_index.minimization | 122 |
| abstract_inverted_index.normalization | 8, 27, 38 |
| abstract_inverted_index.hyperparameters | 66 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 3 |
| sustainable_development_goals[0].id | https://metadata.un.org/sdg/4 |
| sustainable_development_goals[0].score | 0.4699999988079071 |
| sustainable_development_goals[0].display_name | Quality Education |
| citation_normalized_percentile |