Expressive Range Characterization of Open Text-to-Audio Models Article Swipe
YOU?
·
· 2025
· Open Access
·
· DOI: https://doi.org/10.1609/aiide.v21i1.36813
Text-to-audio models are a type of generative model that produces audio output in response to a given textual prompt. Although level generators and the properties of the functional content that they create (e.g., playability) dominate most discourse in procedurally generated content (PCG), games that emotionally resonate with players tend to weave together a range of creative and multimodal content (e.g., music, sounds, visuals, narrative tone), and multimodal models have begun seeing at least experimental use for this purpose. However, it remains unclear what exactly such models generate, and with what degree of variability and fidelity: audio is an extremely broad class of output for a generative system to target. Within the PCG community, expressive range analysis (ERA) has been used as a quantitative way to characterize generators' output space, especially for level generators. This paper adapts ERA to text-to-audio models, making the analysis tractable by looking at the expressive range of outputs for specific, fixed prompts. Experiments are conducted by prompting the models with several standardized prompts derived from the Environmental Sound Classification (ESC-50) dataset. The resulting audio is analyzed along key acoustic dimensions (e.g., pitch, loudness, and timbre). More broadly, this paper offers a framework for ERA-based exploratory evaluation of generative audio models.
Related Topics
- Type
- article
- Landing Page
- https://doi.org/10.1609/aiide.v21i1.36813
- https://ojs.aaai.org/index.php/AIIDE/article/download/36813/38951
- OA Status
- bronze
- OpenAlex ID
- https://openalex.org/W4416014018
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4416014018Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.1609/aiide.v21i1.36813Digital Object Identifier
- Title
-
Expressive Range Characterization of Open Text-to-Audio ModelsWork title
- Type
-
articleOpenAlex work type
- Publication year
-
2025Year of publication
- Publication date
-
2025-11-07Full publication date if available
- Authors
-
Jonathan Morse, Ali Naderi, Swen E. Gaudl, Mark Cartwright, Amy K. Hoover, Mark NelsonList of authors in order
- Landing page
-
https://doi.org/10.1609/aiide.v21i1.36813Publisher landing page
- PDF URL
-
https://ojs.aaai.org/index.php/AIIDE/article/download/36813/38951Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
bronzeOpen access status per OpenAlex
- OA URL
-
https://ojs.aaai.org/index.php/AIIDE/article/download/36813/38951Direct OA link when available
- Cited by
-
0Total citation count in OpenAlex
Full payload
| id | https://openalex.org/W4416014018 |
|---|---|
| doi | https://doi.org/10.1609/aiide.v21i1.36813 |
| ids.doi | https://doi.org/10.1609/aiide.v21i1.36813 |
| ids.openalex | https://openalex.org/W4416014018 |
| fwci | |
| type | article |
| title | Expressive Range Characterization of Open Text-to-Audio Models |
| biblio.issue | 1 |
| biblio.volume | 21 |
| biblio.last_page | 98 |
| biblio.first_page | 91 |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| language | |
| locations[0].id | doi:10.1609/aiide.v21i1.36813 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4387284112 |
| locations[0].source.issn | 2326-909X, 2334-0924 |
| locations[0].source.type | journal |
| locations[0].source.is_oa | False |
| locations[0].source.issn_l | 2326-909X |
| locations[0].source.is_core | True |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment |
| locations[0].source.host_organization | |
| locations[0].source.host_organization_name | |
| locations[0].license | |
| locations[0].pdf_url | https://ojs.aaai.org/index.php/AIIDE/article/download/36813/38951 |
| locations[0].version | publishedVersion |
| locations[0].raw_type | journal-article |
| locations[0].license_id | |
| locations[0].is_accepted | True |
| locations[0].is_published | True |
| locations[0].raw_source_name | Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment |
| locations[0].landing_page_url | https://doi.org/10.1609/aiide.v21i1.36813 |
| indexed_in | crossref |
| authorships[0].author.id | https://openalex.org/A5055319457 |
| authorships[0].author.orcid | |
| authorships[0].author.display_name | Jonathan Morse |
| authorships[0].countries | US |
| authorships[0].affiliations[0].institution_ids | https://openalex.org/I118118575 |
| authorships[0].affiliations[0].raw_affiliation_string | New Jersey Institute of Technology, Newark, New Jersey, USA |
| authorships[0].institutions[0].id | https://openalex.org/I118118575 |
| authorships[0].institutions[0].ror | https://ror.org/05e74xb87 |
| authorships[0].institutions[0].type | education |
| authorships[0].institutions[0].lineage | https://openalex.org/I118118575 |
| authorships[0].institutions[0].country_code | US |
| authorships[0].institutions[0].display_name | New Jersey Institute of Technology |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Jonathan Morse |
| authorships[0].is_corresponding | False |
| authorships[0].raw_affiliation_strings | New Jersey Institute of Technology, Newark, New Jersey, USA |
| authorships[1].author.id | https://openalex.org/A5103275545 |
| authorships[1].author.orcid | https://orcid.org/0000-0002-7281-2150 |
| authorships[1].author.display_name | Ali Naderi |
| authorships[1].countries | US |
| authorships[1].affiliations[0].institution_ids | https://openalex.org/I118118575 |
| authorships[1].affiliations[0].raw_affiliation_string | New Jersey Institute of Technology, Newark, New Jersey, USA |
| authorships[1].institutions[0].id | https://openalex.org/I118118575 |
| authorships[1].institutions[0].ror | https://ror.org/05e74xb87 |
| authorships[1].institutions[0].type | education |
| authorships[1].institutions[0].lineage | https://openalex.org/I118118575 |
| authorships[1].institutions[0].country_code | US |
| authorships[1].institutions[0].display_name | New Jersey Institute of Technology |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Azadeh Naderi |
| authorships[1].is_corresponding | False |
| authorships[1].raw_affiliation_strings | New Jersey Institute of Technology, Newark, New Jersey, USA |
| authorships[2].author.id | https://openalex.org/A5044426073 |
| authorships[2].author.orcid | https://orcid.org/0000-0003-3116-3761 |
| authorships[2].author.display_name | Swen E. Gaudl |
| authorships[2].countries | SE |
| authorships[2].affiliations[0].institution_ids | https://openalex.org/I881427289 |
| authorships[2].affiliations[0].raw_affiliation_string | University of Gothenburg, Gothenburg, Sweden |
| authorships[2].institutions[0].id | https://openalex.org/I881427289 |
| authorships[2].institutions[0].ror | https://ror.org/01tm6cn81 |
| authorships[2].institutions[0].type | education |
| authorships[2].institutions[0].lineage | https://openalex.org/I881427289 |
| authorships[2].institutions[0].country_code | SE |
| authorships[2].institutions[0].display_name | University of Gothenburg |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Swen Gaudl |
| authorships[2].is_corresponding | False |
| authorships[2].raw_affiliation_strings | University of Gothenburg, Gothenburg, Sweden |
| authorships[3].author.id | https://openalex.org/A5056532548 |
| authorships[3].author.orcid | https://orcid.org/0000-0002-5908-390X |
| authorships[3].author.display_name | Mark Cartwright |
| authorships[3].countries | US |
| authorships[3].affiliations[0].institution_ids | https://openalex.org/I118118575 |
| authorships[3].affiliations[0].raw_affiliation_string | New Jersey Institute of Technology, Newark, New Jersey, USA |
| authorships[3].institutions[0].id | https://openalex.org/I118118575 |
| authorships[3].institutions[0].ror | https://ror.org/05e74xb87 |
| authorships[3].institutions[0].type | education |
| authorships[3].institutions[0].lineage | https://openalex.org/I118118575 |
| authorships[3].institutions[0].country_code | US |
| authorships[3].institutions[0].display_name | New Jersey Institute of Technology |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Mark Cartwright |
| authorships[3].is_corresponding | False |
| authorships[3].raw_affiliation_strings | New Jersey Institute of Technology, Newark, New Jersey, USA |
| authorships[4].author.id | https://openalex.org/A5063577751 |
| authorships[4].author.orcid | https://orcid.org/0000-0002-4661-8178 |
| authorships[4].author.display_name | Amy K. Hoover |
| authorships[4].countries | US |
| authorships[4].affiliations[0].institution_ids | https://openalex.org/I118118575 |
| authorships[4].affiliations[0].raw_affiliation_string | New Jersey Institute of Technology, Newark, New Jersey, USA |
| authorships[4].institutions[0].id | https://openalex.org/I118118575 |
| authorships[4].institutions[0].ror | https://ror.org/05e74xb87 |
| authorships[4].institutions[0].type | education |
| authorships[4].institutions[0].lineage | https://openalex.org/I118118575 |
| authorships[4].institutions[0].country_code | US |
| authorships[4].institutions[0].display_name | New Jersey Institute of Technology |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Amy K. Hoover |
| authorships[4].is_corresponding | False |
| authorships[4].raw_affiliation_strings | New Jersey Institute of Technology, Newark, New Jersey, USA |
| authorships[5].author.id | https://openalex.org/A5101866629 |
| authorships[5].author.orcid | https://orcid.org/0000-0003-1882-8896 |
| authorships[5].author.display_name | Mark Nelson |
| authorships[5].countries | US |
| authorships[5].affiliations[0].institution_ids | https://openalex.org/I181401687 |
| authorships[5].affiliations[0].raw_affiliation_string | American University, Washington, D.C., USA |
| authorships[5].institutions[0].id | https://openalex.org/I181401687 |
| authorships[5].institutions[0].ror | https://ror.org/052w4zt36 |
| authorships[5].institutions[0].type | education |
| authorships[5].institutions[0].lineage | https://openalex.org/I181401687 |
| authorships[5].institutions[0].country_code | US |
| authorships[5].institutions[0].display_name | American University |
| authorships[5].author_position | last |
| authorships[5].raw_author_name | Mark J. Nelson |
| authorships[5].is_corresponding | False |
| authorships[5].raw_affiliation_strings | American University, Washington, D.C., USA |
| has_content.pdf | True |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://ojs.aaai.org/index.php/AIIDE/article/download/36813/38951 |
| open_access.oa_status | bronze |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-11-07T00:00:00 |
| display_name | Expressive Range Characterization of Open Text-to-Audio Models |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-08T23:21:52.890332 |
| primary_topic | |
| cited_by_count | 0 |
| locations_count | 1 |
| best_oa_location.id | doi:10.1609/aiide.v21i1.36813 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4387284112 |
| best_oa_location.source.issn | 2326-909X, 2334-0924 |
| best_oa_location.source.type | journal |
| best_oa_location.source.is_oa | False |
| best_oa_location.source.issn_l | 2326-909X |
| best_oa_location.source.is_core | True |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment |
| best_oa_location.source.host_organization | |
| best_oa_location.source.host_organization_name | |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://ojs.aaai.org/index.php/AIIDE/article/download/36813/38951 |
| best_oa_location.version | publishedVersion |
| best_oa_location.raw_type | journal-article |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | True |
| best_oa_location.is_published | True |
| best_oa_location.raw_source_name | Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment |
| best_oa_location.landing_page_url | https://doi.org/10.1609/aiide.v21i1.36813 |
| primary_location.id | doi:10.1609/aiide.v21i1.36813 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4387284112 |
| primary_location.source.issn | 2326-909X, 2334-0924 |
| primary_location.source.type | journal |
| primary_location.source.is_oa | False |
| primary_location.source.issn_l | 2326-909X |
| primary_location.source.is_core | True |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment |
| primary_location.source.host_organization | |
| primary_location.source.host_organization_name | |
| primary_location.license | |
| primary_location.pdf_url | https://ojs.aaai.org/index.php/AIIDE/article/download/36813/38951 |
| primary_location.version | publishedVersion |
| primary_location.raw_type | journal-article |
| primary_location.license_id | |
| primary_location.is_accepted | True |
| primary_location.is_published | True |
| primary_location.raw_source_name | Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment |
| primary_location.landing_page_url | https://doi.org/10.1609/aiide.v21i1.36813 |
| publication_date | 2025-11-07 |
| publication_year | 2025 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 3, 15, 52, 104, 121, 194 |
| abstract_inverted_index.an | 97 |
| abstract_inverted_index.as | 120 |
| abstract_inverted_index.at | 71, 146 |
| abstract_inverted_index.by | 144, 159 |
| abstract_inverted_index.in | 12, 37 |
| abstract_inverted_index.is | 96, 178 |
| abstract_inverted_index.it | 79 |
| abstract_inverted_index.of | 5, 25, 54, 91, 101, 150, 200 |
| abstract_inverted_index.to | 14, 49, 107, 124, 137 |
| abstract_inverted_index.ERA | 136 |
| abstract_inverted_index.PCG | 111 |
| abstract_inverted_index.The | 175 |
| abstract_inverted_index.and | 22, 56, 65, 87, 93, 187 |
| abstract_inverted_index.are | 2, 157 |
| abstract_inverted_index.for | 75, 103, 130, 152, 196 |
| abstract_inverted_index.has | 117 |
| abstract_inverted_index.key | 181 |
| abstract_inverted_index.the | 23, 26, 110, 141, 147, 161, 169 |
| abstract_inverted_index.use | 74 |
| abstract_inverted_index.way | 123 |
| abstract_inverted_index.More | 189 |
| abstract_inverted_index.This | 133 |
| abstract_inverted_index.been | 118 |
| abstract_inverted_index.from | 168 |
| abstract_inverted_index.have | 68 |
| abstract_inverted_index.most | 35 |
| abstract_inverted_index.such | 84 |
| abstract_inverted_index.tend | 48 |
| abstract_inverted_index.that | 8, 29, 43 |
| abstract_inverted_index.they | 30 |
| abstract_inverted_index.this | 76, 191 |
| abstract_inverted_index.type | 4 |
| abstract_inverted_index.used | 119 |
| abstract_inverted_index.what | 82, 89 |
| abstract_inverted_index.with | 46, 88, 163 |
| abstract_inverted_index.(ERA) | 116 |
| abstract_inverted_index.Sound | 171 |
| abstract_inverted_index.along | 180 |
| abstract_inverted_index.audio | 10, 95, 177, 202 |
| abstract_inverted_index.begun | 69 |
| abstract_inverted_index.broad | 99 |
| abstract_inverted_index.class | 100 |
| abstract_inverted_index.fixed | 154 |
| abstract_inverted_index.games | 42 |
| abstract_inverted_index.given | 16 |
| abstract_inverted_index.least | 72 |
| abstract_inverted_index.level | 20, 131 |
| abstract_inverted_index.model | 7 |
| abstract_inverted_index.paper | 134, 192 |
| abstract_inverted_index.range | 53, 114, 149 |
| abstract_inverted_index.weave | 50 |
| abstract_inverted_index.(PCG), | 41 |
| abstract_inverted_index.(e.g., | 32, 59, 184 |
| abstract_inverted_index.Within | 109 |
| abstract_inverted_index.adapts | 135 |
| abstract_inverted_index.create | 31 |
| abstract_inverted_index.degree | 90 |
| abstract_inverted_index.making | 140 |
| abstract_inverted_index.models | 1, 67, 85, 162 |
| abstract_inverted_index.music, | 60 |
| abstract_inverted_index.offers | 193 |
| abstract_inverted_index.output | 11, 102, 127 |
| abstract_inverted_index.pitch, | 185 |
| abstract_inverted_index.seeing | 70 |
| abstract_inverted_index.space, | 128 |
| abstract_inverted_index.system | 106 |
| abstract_inverted_index.tone), | 64 |
| abstract_inverted_index.content | 28, 40, 58 |
| abstract_inverted_index.derived | 167 |
| abstract_inverted_index.exactly | 83 |
| abstract_inverted_index.looking | 145 |
| abstract_inverted_index.models, | 139 |
| abstract_inverted_index.models. | 203 |
| abstract_inverted_index.outputs | 151 |
| abstract_inverted_index.players | 47 |
| abstract_inverted_index.prompt. | 18 |
| abstract_inverted_index.prompts | 166 |
| abstract_inverted_index.remains | 80 |
| abstract_inverted_index.several | 164 |
| abstract_inverted_index.sounds, | 61 |
| abstract_inverted_index.target. | 108 |
| abstract_inverted_index.textual | 17 |
| abstract_inverted_index.unclear | 81 |
| abstract_inverted_index.(ESC-50) | 173 |
| abstract_inverted_index.Although | 19 |
| abstract_inverted_index.However, | 78 |
| abstract_inverted_index.acoustic | 182 |
| abstract_inverted_index.analysis | 115, 142 |
| abstract_inverted_index.analyzed | 179 |
| abstract_inverted_index.broadly, | 190 |
| abstract_inverted_index.creative | 55 |
| abstract_inverted_index.dataset. | 174 |
| abstract_inverted_index.dominate | 34 |
| abstract_inverted_index.produces | 9 |
| abstract_inverted_index.prompts. | 155 |
| abstract_inverted_index.purpose. | 77 |
| abstract_inverted_index.resonate | 45 |
| abstract_inverted_index.response | 13 |
| abstract_inverted_index.timbre). | 188 |
| abstract_inverted_index.together | 51 |
| abstract_inverted_index.visuals, | 62 |
| abstract_inverted_index.ERA-based | 197 |
| abstract_inverted_index.conducted | 158 |
| abstract_inverted_index.discourse | 36 |
| abstract_inverted_index.extremely | 98 |
| abstract_inverted_index.fidelity: | 94 |
| abstract_inverted_index.framework | 195 |
| abstract_inverted_index.generate, | 86 |
| abstract_inverted_index.generated | 39 |
| abstract_inverted_index.loudness, | 186 |
| abstract_inverted_index.narrative | 63 |
| abstract_inverted_index.prompting | 160 |
| abstract_inverted_index.resulting | 176 |
| abstract_inverted_index.specific, | 153 |
| abstract_inverted_index.tractable | 143 |
| abstract_inverted_index.community, | 112 |
| abstract_inverted_index.dimensions | 183 |
| abstract_inverted_index.especially | 129 |
| abstract_inverted_index.evaluation | 199 |
| abstract_inverted_index.expressive | 113, 148 |
| abstract_inverted_index.functional | 27 |
| abstract_inverted_index.generative | 6, 105, 201 |
| abstract_inverted_index.generators | 21 |
| abstract_inverted_index.multimodal | 57, 66 |
| abstract_inverted_index.properties | 24 |
| abstract_inverted_index.Experiments | 156 |
| abstract_inverted_index.emotionally | 44 |
| abstract_inverted_index.exploratory | 198 |
| abstract_inverted_index.generators' | 126 |
| abstract_inverted_index.generators. | 132 |
| abstract_inverted_index.variability | 92 |
| abstract_inverted_index.characterize | 125 |
| abstract_inverted_index.experimental | 73 |
| abstract_inverted_index.playability) | 33 |
| abstract_inverted_index.procedurally | 38 |
| abstract_inverted_index.quantitative | 122 |
| abstract_inverted_index.standardized | 165 |
| abstract_inverted_index.Environmental | 170 |
| abstract_inverted_index.Text-to-audio | 0 |
| abstract_inverted_index.text-to-audio | 138 |
| abstract_inverted_index.Classification | 172 |
| cited_by_percentile_year | |
| countries_distinct_count | 2 |
| institutions_distinct_count | 6 |
| citation_normalized_percentile |