Physics-Informed Neural Networks for Speech Production Article Swipe
YOU?
·
· 2025
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2511.00428
The analysis of speech production based on physical models of the vocal folds and vocal tract is essential for studies on vocal-fold behavior and linguistic research. This paper proposes a speech production analysis method using physics-informed neural networks (PINNs). The networks are trained directly on the governing equations of vocal-fold vibration and vocal-tract acoustics. Vocal-fold collisions introduce nondifferentiability and vanishing gradients, challenging phenomena for PINNs. We demonstrate, however, that introducing a differentiable approximation function enables the analysis of vocal-fold vibrations within the PINN framework. The period of self-excited vocal-fold vibration is generally unknown. We show that by treating the period as a learnable network parameter, a periodic solution can be obtained. Furthermore, by implementing the coupling between glottal flow and vocal-tract acoustics as a hard constraint, glottis-tract interaction is achieved without additional loss terms. We confirmed the method's validity through forward and inverse analyses, demonstrating that the glottal flow rate, vocal-fold vibratory state, and subglottal pressure can be simultaneously estimated from speech signals. Notably, the same network architecture can be applied to both forward and inverse analyses, highlighting the versatility of this approach. The proposed method inherits the advantages of PINNs, including mesh-free computation and the natural incorporation of nonlinearities, and thus holds promise for a wide range of applications.
Related Topics
- Type
- preprint
- Landing Page
- http://arxiv.org/abs/2511.00428
- https://arxiv.org/pdf/2511.00428
- OA Status
- green
- OpenAlex ID
- https://openalex.org/W4415937571
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4415937571Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2511.00428Digital Object Identifier
- Title
-
Physics-Informed Neural Networks for Speech ProductionWork title
- Type
-
preprintOpenAlex work type
- Publication year
-
2025Year of publication
- Publication date
-
2025-11-01Full publication date if available
- Authors
-
Kazuya Yokota, Ryosuke Harakawa, Masaaki Baba, Masahiro IwahashiList of authors in order
- Landing page
-
https://arxiv.org/abs/2511.00428Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2511.00428Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2511.00428Direct OA link when available
- Cited by
-
0Total citation count in OpenAlex
Full payload
| id | https://openalex.org/W4415937571 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2511.00428 |
| ids.doi | https://doi.org/10.48550/arxiv.2511.00428 |
| ids.openalex | https://openalex.org/W4415937571 |
| fwci | |
| type | preprint |
| title | Physics-Informed Neural Networks for Speech Production |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| language | |
| locations[0].id | pmh:oai:arXiv.org:2511.00428 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2511.00428 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2511.00428 |
| locations[1].id | doi:10.48550/arxiv.2511.00428 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2511.00428 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5061651964 |
| authorships[0].author.orcid | https://orcid.org/0000-0002-5930-6499 |
| authorships[0].author.display_name | Kazuya Yokota |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Yokota, Kazuya |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5023942291 |
| authorships[1].author.orcid | https://orcid.org/0000-0002-7166-4440 |
| authorships[1].author.display_name | Ryosuke Harakawa |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Harakawa, Ryosuke |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5061012564 |
| authorships[2].author.orcid | https://orcid.org/0000-0002-0877-7792 |
| authorships[2].author.display_name | Masaaki Baba |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Baba, Masaaki |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5036393534 |
| authorships[3].author.orcid | https://orcid.org/0000-0002-7566-1247 |
| authorships[3].author.display_name | Masahiro Iwahashi |
| authorships[3].author_position | last |
| authorships[3].raw_author_name | Iwahashi, Masahiro |
| authorships[3].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2511.00428 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-11-05T00:00:00 |
| display_name | Physics-Informed Neural Networks for Speech Production |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic | |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2511.00428 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2511.00428 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2511.00428 |
| primary_location.id | pmh:oai:arXiv.org:2511.00428 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2511.00428 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2511.00428 |
| publication_date | 2025-11-01 |
| publication_year | 2025 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 29, 70, 101, 105, 123, 205 |
| abstract_inverted_index.We | 65, 93, 134 |
| abstract_inverted_index.as | 100, 122 |
| abstract_inverted_index.be | 109, 157, 169 |
| abstract_inverted_index.by | 96, 112 |
| abstract_inverted_index.is | 16, 90, 128 |
| abstract_inverted_index.of | 2, 9, 48, 77, 86, 180, 189, 198, 208 |
| abstract_inverted_index.on | 6, 20, 44 |
| abstract_inverted_index.to | 171 |
| abstract_inverted_index.The | 0, 39, 84, 183 |
| abstract_inverted_index.and | 13, 23, 51, 58, 119, 141, 153, 174, 194, 200 |
| abstract_inverted_index.are | 41 |
| abstract_inverted_index.can | 108, 156, 168 |
| abstract_inverted_index.for | 18, 63, 204 |
| abstract_inverted_index.the | 10, 45, 75, 81, 98, 114, 136, 146, 164, 178, 187, 195 |
| abstract_inverted_index.PINN | 82 |
| abstract_inverted_index.This | 26 |
| abstract_inverted_index.both | 172 |
| abstract_inverted_index.flow | 118, 148 |
| abstract_inverted_index.from | 160 |
| abstract_inverted_index.hard | 124 |
| abstract_inverted_index.loss | 132 |
| abstract_inverted_index.same | 165 |
| abstract_inverted_index.show | 94 |
| abstract_inverted_index.that | 68, 95, 145 |
| abstract_inverted_index.this | 181 |
| abstract_inverted_index.thus | 201 |
| abstract_inverted_index.wide | 206 |
| abstract_inverted_index.based | 5 |
| abstract_inverted_index.folds | 12 |
| abstract_inverted_index.holds | 202 |
| abstract_inverted_index.paper | 27 |
| abstract_inverted_index.range | 207 |
| abstract_inverted_index.rate, | 149 |
| abstract_inverted_index.tract | 15 |
| abstract_inverted_index.using | 34 |
| abstract_inverted_index.vocal | 11, 14 |
| abstract_inverted_index.PINNs, | 190 |
| abstract_inverted_index.PINNs. | 64 |
| abstract_inverted_index.method | 33, 185 |
| abstract_inverted_index.models | 8 |
| abstract_inverted_index.neural | 36 |
| abstract_inverted_index.period | 85, 99 |
| abstract_inverted_index.speech | 3, 30, 161 |
| abstract_inverted_index.state, | 152 |
| abstract_inverted_index.terms. | 133 |
| abstract_inverted_index.within | 80 |
| abstract_inverted_index.applied | 170 |
| abstract_inverted_index.between | 116 |
| abstract_inverted_index.enables | 74 |
| abstract_inverted_index.forward | 140, 173 |
| abstract_inverted_index.glottal | 117, 147 |
| abstract_inverted_index.inverse | 142, 175 |
| abstract_inverted_index.natural | 196 |
| abstract_inverted_index.network | 103, 166 |
| abstract_inverted_index.promise | 203 |
| abstract_inverted_index.studies | 19 |
| abstract_inverted_index.through | 139 |
| abstract_inverted_index.trained | 42 |
| abstract_inverted_index.without | 130 |
| abstract_inverted_index.(PINNs). | 38 |
| abstract_inverted_index.Notably, | 163 |
| abstract_inverted_index.achieved | 129 |
| abstract_inverted_index.analysis | 1, 32, 76 |
| abstract_inverted_index.behavior | 22 |
| abstract_inverted_index.coupling | 115 |
| abstract_inverted_index.directly | 43 |
| abstract_inverted_index.function | 73 |
| abstract_inverted_index.however, | 67 |
| abstract_inverted_index.inherits | 186 |
| abstract_inverted_index.method's | 137 |
| abstract_inverted_index.networks | 37, 40 |
| abstract_inverted_index.periodic | 106 |
| abstract_inverted_index.physical | 7 |
| abstract_inverted_index.pressure | 155 |
| abstract_inverted_index.proposed | 184 |
| abstract_inverted_index.proposes | 28 |
| abstract_inverted_index.signals. | 162 |
| abstract_inverted_index.solution | 107 |
| abstract_inverted_index.treating | 97 |
| abstract_inverted_index.unknown. | 92 |
| abstract_inverted_index.validity | 138 |
| abstract_inverted_index.acoustics | 121 |
| abstract_inverted_index.analyses, | 143, 176 |
| abstract_inverted_index.approach. | 182 |
| abstract_inverted_index.confirmed | 135 |
| abstract_inverted_index.equations | 47 |
| abstract_inverted_index.essential | 17 |
| abstract_inverted_index.estimated | 159 |
| abstract_inverted_index.generally | 91 |
| abstract_inverted_index.governing | 46 |
| abstract_inverted_index.including | 191 |
| abstract_inverted_index.introduce | 56 |
| abstract_inverted_index.learnable | 102 |
| abstract_inverted_index.mesh-free | 192 |
| abstract_inverted_index.obtained. | 110 |
| abstract_inverted_index.phenomena | 62 |
| abstract_inverted_index.research. | 25 |
| abstract_inverted_index.vanishing | 59 |
| abstract_inverted_index.vibration | 50, 89 |
| abstract_inverted_index.vibratory | 151 |
| abstract_inverted_index.Vocal-fold | 54 |
| abstract_inverted_index.acoustics. | 53 |
| abstract_inverted_index.additional | 131 |
| abstract_inverted_index.advantages | 188 |
| abstract_inverted_index.collisions | 55 |
| abstract_inverted_index.framework. | 83 |
| abstract_inverted_index.gradients, | 60 |
| abstract_inverted_index.linguistic | 24 |
| abstract_inverted_index.parameter, | 104 |
| abstract_inverted_index.production | 4, 31 |
| abstract_inverted_index.subglottal | 154 |
| abstract_inverted_index.vibrations | 79 |
| abstract_inverted_index.vocal-fold | 21, 49, 78, 88, 150 |
| abstract_inverted_index.challenging | 61 |
| abstract_inverted_index.computation | 193 |
| abstract_inverted_index.constraint, | 125 |
| abstract_inverted_index.interaction | 127 |
| abstract_inverted_index.introducing | 69 |
| abstract_inverted_index.versatility | 179 |
| abstract_inverted_index.vocal-tract | 52, 120 |
| abstract_inverted_index.Furthermore, | 111 |
| abstract_inverted_index.architecture | 167 |
| abstract_inverted_index.demonstrate, | 66 |
| abstract_inverted_index.highlighting | 177 |
| abstract_inverted_index.implementing | 113 |
| abstract_inverted_index.self-excited | 87 |
| abstract_inverted_index.applications. | 209 |
| abstract_inverted_index.approximation | 72 |
| abstract_inverted_index.demonstrating | 144 |
| abstract_inverted_index.glottis-tract | 126 |
| abstract_inverted_index.incorporation | 197 |
| abstract_inverted_index.differentiable | 71 |
| abstract_inverted_index.simultaneously | 158 |
| abstract_inverted_index.nonlinearities, | 199 |
| abstract_inverted_index.physics-informed | 35 |
| abstract_inverted_index.nondifferentiability | 57 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 4 |
| citation_normalized_percentile |