Speech Prediction in Silent Videos Using Variational Autoencoders Article Swipe
YOU?
·
· 2021
· Open Access
·
· DOI: https://doi.org/10.1109/icassp39728.2021.9414040
Understanding the relationship between the auditory and visual signals is crucial for many different applications ranging from computer-generated imagery (CGI) and video editing automation to assisting people with hearing or visual impairments. However, this is challenging since the distribution of both audio and visual modality is inherently multimodal. Therefore, most of the existing methods ignore the multimodal aspect and assume that there only exists a deterministic one-to-one mapping between the two modalities. It can lead to low-quality predictions as the model collapses to optimizing the average behavior rather than learning the full data distributions. In this paper, we present a stochastic model for generating speech in a silent video. The proposed model combines recurrent neural networks and variational deep generative models to learn the auditory signal's conditional distribution given the visual signal. We demonstrate the performance of our model on the GRID dataset based on standard benchmarks.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- https://doi.org/10.1109/icassp39728.2021.9414040
- OA Status
- green
- Cited By
- 1
- References
- 23
- Related Works
- 20
- OpenAlex ID
- https://openalex.org/W3103085143
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W3103085143Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.1109/icassp39728.2021.9414040Digital Object Identifier
- Title
-
Speech Prediction in Silent Videos Using Variational AutoencodersWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2021Year of publication
- Publication date
-
2021-05-13Full publication date if available
- Authors
-
Ravindra Yadav, Ashish Sardana, Vinay P. Namboodiri, Rajesh M. HegdeList of authors in order
- Landing page
-
https://doi.org/10.1109/icassp39728.2021.9414040Publisher landing page
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2011.07340Direct OA link when available
- Concepts
-
Computer science, Speech recognition, Modalities, Modality (human–computer interaction), Artificial intelligence, Ranging, Generative model, Visualization, SIGNAL (programming language), Grid, Artificial neural network, Generative grammar, Machine learning, Telecommunications, Programming language, Social science, Mathematics, Geometry, SociologyTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
1Total citation count in OpenAlex
- Citations by year (recent)
-
2024: 1Per-year citation counts (last 5 years)
- References (count)
-
23Number of works referenced by this work
- Related works (count)
-
20Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W3103085143 |
|---|---|
| doi | https://doi.org/10.1109/icassp39728.2021.9414040 |
| ids.doi | https://doi.org/10.1109/icassp39728.2021.9414040 |
| ids.mag | 3103085143 |
| ids.openalex | https://openalex.org/W3103085143 |
| fwci | 0.14397953 |
| type | preprint |
| title | Speech Prediction in Silent Videos Using Variational Autoencoders |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | 7052 |
| biblio.first_page | 7048 |
| topics[0].id | https://openalex.org/T10860 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 1.0 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1711 |
| topics[0].subfield.display_name | Signal Processing |
| topics[0].display_name | Speech and Audio Processing |
| topics[1].id | https://openalex.org/T11309 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9993000030517578 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1711 |
| topics[1].subfield.display_name | Signal Processing |
| topics[1].display_name | Music and Audio Processing |
| topics[2].id | https://openalex.org/T11439 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.9933000206947327 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1707 |
| topics[2].subfield.display_name | Computer Vision and Pattern Recognition |
| topics[2].display_name | Video Analysis and Summarization |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C41008148 |
| concepts[0].level | 0 |
| concepts[0].score | 0.7863545417785645 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[0].display_name | Computer science |
| concepts[1].id | https://openalex.org/C28490314 |
| concepts[1].level | 1 |
| concepts[1].score | 0.55987948179245 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q189436 |
| concepts[1].display_name | Speech recognition |
| concepts[2].id | https://openalex.org/C2779903281 |
| concepts[2].level | 2 |
| concepts[2].score | 0.5275102257728577 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q6888026 |
| concepts[2].display_name | Modalities |
| concepts[3].id | https://openalex.org/C2780226545 |
| concepts[3].level | 2 |
| concepts[3].score | 0.5226906538009644 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q6888030 |
| concepts[3].display_name | Modality (human–computer interaction) |
| concepts[4].id | https://openalex.org/C154945302 |
| concepts[4].level | 1 |
| concepts[4].score | 0.5217869281768799 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[4].display_name | Artificial intelligence |
| concepts[5].id | https://openalex.org/C115051666 |
| concepts[5].level | 2 |
| concepts[5].score | 0.5129973292350769 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q6522493 |
| concepts[5].display_name | Ranging |
| concepts[6].id | https://openalex.org/C167966045 |
| concepts[6].level | 3 |
| concepts[6].score | 0.4808812439441681 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q5532625 |
| concepts[6].display_name | Generative model |
| concepts[7].id | https://openalex.org/C36464697 |
| concepts[7].level | 2 |
| concepts[7].score | 0.4383709728717804 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q451553 |
| concepts[7].display_name | Visualization |
| concepts[8].id | https://openalex.org/C2779843651 |
| concepts[8].level | 2 |
| concepts[8].score | 0.42874160408973694 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q7390335 |
| concepts[8].display_name | SIGNAL (programming language) |
| concepts[9].id | https://openalex.org/C187691185 |
| concepts[9].level | 2 |
| concepts[9].score | 0.42590227723121643 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q2020720 |
| concepts[9].display_name | Grid |
| concepts[10].id | https://openalex.org/C50644808 |
| concepts[10].level | 2 |
| concepts[10].score | 0.4198591113090515 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q192776 |
| concepts[10].display_name | Artificial neural network |
| concepts[11].id | https://openalex.org/C39890363 |
| concepts[11].level | 2 |
| concepts[11].score | 0.4080892503261566 |
| concepts[11].wikidata | https://www.wikidata.org/wiki/Q36108 |
| concepts[11].display_name | Generative grammar |
| concepts[12].id | https://openalex.org/C119857082 |
| concepts[12].level | 1 |
| concepts[12].score | 0.3395358622074127 |
| concepts[12].wikidata | https://www.wikidata.org/wiki/Q2539 |
| concepts[12].display_name | Machine learning |
| concepts[13].id | https://openalex.org/C76155785 |
| concepts[13].level | 1 |
| concepts[13].score | 0.0 |
| concepts[13].wikidata | https://www.wikidata.org/wiki/Q418 |
| concepts[13].display_name | Telecommunications |
| concepts[14].id | https://openalex.org/C199360897 |
| concepts[14].level | 1 |
| concepts[14].score | 0.0 |
| concepts[14].wikidata | https://www.wikidata.org/wiki/Q9143 |
| concepts[14].display_name | Programming language |
| concepts[15].id | https://openalex.org/C36289849 |
| concepts[15].level | 1 |
| concepts[15].score | 0.0 |
| concepts[15].wikidata | https://www.wikidata.org/wiki/Q34749 |
| concepts[15].display_name | Social science |
| concepts[16].id | https://openalex.org/C33923547 |
| concepts[16].level | 0 |
| concepts[16].score | 0.0 |
| concepts[16].wikidata | https://www.wikidata.org/wiki/Q395 |
| concepts[16].display_name | Mathematics |
| concepts[17].id | https://openalex.org/C2524010 |
| concepts[17].level | 1 |
| concepts[17].score | 0.0 |
| concepts[17].wikidata | https://www.wikidata.org/wiki/Q8087 |
| concepts[17].display_name | Geometry |
| concepts[18].id | https://openalex.org/C144024400 |
| concepts[18].level | 0 |
| concepts[18].score | 0.0 |
| concepts[18].wikidata | https://www.wikidata.org/wiki/Q21201 |
| concepts[18].display_name | Sociology |
| keywords[0].id | https://openalex.org/keywords/computer-science |
| keywords[0].score | 0.7863545417785645 |
| keywords[0].display_name | Computer science |
| keywords[1].id | https://openalex.org/keywords/speech-recognition |
| keywords[1].score | 0.55987948179245 |
| keywords[1].display_name | Speech recognition |
| keywords[2].id | https://openalex.org/keywords/modalities |
| keywords[2].score | 0.5275102257728577 |
| keywords[2].display_name | Modalities |
| keywords[3].id | https://openalex.org/keywords/modality |
| keywords[3].score | 0.5226906538009644 |
| keywords[3].display_name | Modality (human–computer interaction) |
| keywords[4].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[4].score | 0.5217869281768799 |
| keywords[4].display_name | Artificial intelligence |
| keywords[5].id | https://openalex.org/keywords/ranging |
| keywords[5].score | 0.5129973292350769 |
| keywords[5].display_name | Ranging |
| keywords[6].id | https://openalex.org/keywords/generative-model |
| keywords[6].score | 0.4808812439441681 |
| keywords[6].display_name | Generative model |
| keywords[7].id | https://openalex.org/keywords/visualization |
| keywords[7].score | 0.4383709728717804 |
| keywords[7].display_name | Visualization |
| keywords[8].id | https://openalex.org/keywords/signal |
| keywords[8].score | 0.42874160408973694 |
| keywords[8].display_name | SIGNAL (programming language) |
| keywords[9].id | https://openalex.org/keywords/grid |
| keywords[9].score | 0.42590227723121643 |
| keywords[9].display_name | Grid |
| keywords[10].id | https://openalex.org/keywords/artificial-neural-network |
| keywords[10].score | 0.4198591113090515 |
| keywords[10].display_name | Artificial neural network |
| keywords[11].id | https://openalex.org/keywords/generative-grammar |
| keywords[11].score | 0.4080892503261566 |
| keywords[11].display_name | Generative grammar |
| keywords[12].id | https://openalex.org/keywords/machine-learning |
| keywords[12].score | 0.3395358622074127 |
| keywords[12].display_name | Machine learning |
| language | en |
| locations[0].id | doi:10.1109/icassp39728.2021.9414040 |
| locations[0].is_oa | False |
| locations[0].source | |
| locations[0].license | |
| locations[0].pdf_url | |
| locations[0].version | publishedVersion |
| locations[0].raw_type | proceedings-article |
| locations[0].license_id | |
| locations[0].is_accepted | True |
| locations[0].is_published | True |
| locations[0].raw_source_name | ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) |
| locations[0].landing_page_url | https://doi.org/10.1109/icassp39728.2021.9414040 |
| locations[1].id | pmh:oai:arXiv.org:2011.07340 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | cc-by-nc-nd |
| locations[1].pdf_url | https://arxiv.org/pdf/2011.07340 |
| locations[1].version | submittedVersion |
| locations[1].raw_type | text |
| locations[1].license_id | https://openalex.org/licenses/cc-by-nc-nd |
| locations[1].is_accepted | False |
| locations[1].is_published | False |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | http://arxiv.org/abs/2011.07340 |
| locations[2].id | mag:3103085143 |
| locations[2].is_oa | True |
| locations[2].source.id | https://openalex.org/S4306400194 |
| locations[2].source.issn | |
| locations[2].source.type | repository |
| locations[2].source.is_oa | True |
| locations[2].source.issn_l | |
| locations[2].source.is_core | False |
| locations[2].source.is_in_doaj | False |
| locations[2].source.display_name | arXiv (Cornell University) |
| locations[2].source.host_organization | https://openalex.org/I205783295 |
| locations[2].source.host_organization_name | Cornell University |
| locations[2].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[2].license | |
| locations[2].pdf_url | |
| locations[2].version | submittedVersion |
| locations[2].raw_type | |
| locations[2].license_id | |
| locations[2].is_accepted | False |
| locations[2].is_published | False |
| locations[2].raw_source_name | arXiv (Cornell University) |
| locations[2].landing_page_url | https://arxiv.org/pdf/2011.07340.pdf |
| locations[3].id | doi:10.48550/arxiv.2011.07340 |
| locations[3].is_oa | True |
| locations[3].source.id | https://openalex.org/S4306400194 |
| locations[3].source.issn | |
| locations[3].source.type | repository |
| locations[3].source.is_oa | True |
| locations[3].source.issn_l | |
| locations[3].source.is_core | False |
| locations[3].source.is_in_doaj | False |
| locations[3].source.display_name | arXiv (Cornell University) |
| locations[3].source.host_organization | https://openalex.org/I205783295 |
| locations[3].source.host_organization_name | Cornell University |
| locations[3].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[3].license | |
| locations[3].pdf_url | |
| locations[3].version | |
| locations[3].raw_type | article |
| locations[3].license_id | |
| locations[3].is_accepted | False |
| locations[3].is_published | |
| locations[3].raw_source_name | |
| locations[3].landing_page_url | https://doi.org/10.48550/arxiv.2011.07340 |
| locations[4].id | doi:10.17023/x8zh-0b82 |
| locations[4].is_oa | True |
| locations[4].source.id | https://openalex.org/S7407051697 |
| locations[4].source.type | repository |
| locations[4].source.is_oa | False |
| locations[4].source.issn_l | |
| locations[4].source.is_core | False |
| locations[4].source.is_in_doaj | False |
| locations[4].source.display_name | IEEE RESOURCE CENTERS |
| locations[4].source.host_organization | |
| locations[4].source.host_organization_name | |
| locations[4].license | |
| locations[4].pdf_url | |
| locations[4].version | |
| locations[4].raw_type | article |
| locations[4].license_id | |
| locations[4].is_accepted | False |
| locations[4].is_published | |
| locations[4].raw_source_name | |
| locations[4].landing_page_url | https://doi.org/10.17023/x8zh-0b82 |
| indexed_in | arxiv, crossref, datacite |
| authorships[0].author.id | https://openalex.org/A5010648323 |
| authorships[0].author.orcid | https://orcid.org/0000-0003-4628-0688 |
| authorships[0].author.display_name | Ravindra Yadav |
| authorships[0].countries | IN |
| authorships[0].affiliations[0].institution_ids | https://openalex.org/I94234084 |
| authorships[0].affiliations[0].raw_affiliation_string | Indian Institute of Technology - Kanpur, India#TAB# |
| authorships[0].institutions[0].id | https://openalex.org/I94234084 |
| authorships[0].institutions[0].ror | https://ror.org/05pjsgx75 |
| authorships[0].institutions[0].type | education |
| authorships[0].institutions[0].lineage | https://openalex.org/I94234084 |
| authorships[0].institutions[0].country_code | IN |
| authorships[0].institutions[0].display_name | Indian Institute of Technology Kanpur |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Ravindra Yadav |
| authorships[0].is_corresponding | False |
| authorships[0].raw_affiliation_strings | Indian Institute of Technology - Kanpur, India#TAB# |
| authorships[1].author.id | https://openalex.org/A5083439972 |
| authorships[1].author.orcid | |
| authorships[1].author.display_name | Ashish Sardana |
| authorships[1].countries | GB |
| authorships[1].affiliations[0].institution_ids | https://openalex.org/I1304085615 |
| authorships[1].affiliations[0].raw_affiliation_string | nVidia |
| authorships[1].institutions[0].id | https://openalex.org/I1304085615 |
| authorships[1].institutions[0].ror | https://ror.org/02kr42612 |
| authorships[1].institutions[0].type | company |
| authorships[1].institutions[0].lineage | https://openalex.org/I1304085615, https://openalex.org/I4210127875 |
| authorships[1].institutions[0].country_code | GB |
| authorships[1].institutions[0].display_name | Nvidia (United Kingdom) |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Ashish Sardana |
| authorships[1].is_corresponding | False |
| authorships[1].raw_affiliation_strings | nVidia |
| authorships[2].author.id | https://openalex.org/A5007109424 |
| authorships[2].author.orcid | https://orcid.org/0000-0001-5262-9722 |
| authorships[2].author.display_name | Vinay P. Namboodiri |
| authorships[2].countries | GB |
| authorships[2].affiliations[0].institution_ids | https://openalex.org/I51601045 |
| authorships[2].affiliations[0].raw_affiliation_string | #N# University of Bath, UK#N# |
| authorships[2].institutions[0].id | https://openalex.org/I51601045 |
| authorships[2].institutions[0].ror | https://ror.org/002h8g185 |
| authorships[2].institutions[0].type | education |
| authorships[2].institutions[0].lineage | https://openalex.org/I51601045 |
| authorships[2].institutions[0].country_code | GB |
| authorships[2].institutions[0].display_name | University of Bath |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Vinay P Namboodiri |
| authorships[2].is_corresponding | False |
| authorships[2].raw_affiliation_strings | #N# University of Bath, UK#N# |
| authorships[3].author.id | https://openalex.org/A5085503354 |
| authorships[3].author.orcid | https://orcid.org/0000-0002-6142-7724 |
| authorships[3].author.display_name | Rajesh M. Hegde |
| authorships[3].countries | IN |
| authorships[3].affiliations[0].institution_ids | https://openalex.org/I94234084 |
| authorships[3].affiliations[0].raw_affiliation_string | Indian Institute of Technology - Kanpur, India#TAB# |
| authorships[3].institutions[0].id | https://openalex.org/I94234084 |
| authorships[3].institutions[0].ror | https://ror.org/05pjsgx75 |
| authorships[3].institutions[0].type | education |
| authorships[3].institutions[0].lineage | https://openalex.org/I94234084 |
| authorships[3].institutions[0].country_code | IN |
| authorships[3].institutions[0].display_name | Indian Institute of Technology Kanpur |
| authorships[3].author_position | last |
| authorships[3].raw_author_name | Rajesh M Hegde |
| authorships[3].is_corresponding | False |
| authorships[3].raw_affiliation_strings | Indian Institute of Technology - Kanpur, India#TAB# |
| has_content.pdf | True |
| has_content.grobid_xml | True |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2011.07340 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Speech Prediction in Silent Videos Using Variational Autoencoders |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T03:46:38.306776 |
| primary_topic.id | https://openalex.org/T10860 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 1.0 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1711 |
| primary_topic.subfield.display_name | Signal Processing |
| primary_topic.display_name | Speech and Audio Processing |
| related_works | https://openalex.org/W3045032902, https://openalex.org/W3187009280, https://openalex.org/W3098252333, https://openalex.org/W2611160234, https://openalex.org/W1480583224, https://openalex.org/W3209013111, https://openalex.org/W2990467045, https://openalex.org/W2056961398, https://openalex.org/W3141688548, https://openalex.org/W3015925607, https://openalex.org/W2995255435, https://openalex.org/W2613448434, https://openalex.org/W3036496243, https://openalex.org/W2461011248, https://openalex.org/W2982076115, https://openalex.org/W2186282052, https://openalex.org/W2963917086, https://openalex.org/W3093287838, https://openalex.org/W2946520073, https://openalex.org/W2594156432 |
| cited_by_count | 1 |
| counts_by_year[0].year | 2024 |
| counts_by_year[0].cited_by_count | 1 |
| locations_count | 5 |
| best_oa_location.id | pmh:oai:arXiv.org:2011.07340 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | cc-by-nc-nd |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2011.07340 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | https://openalex.org/licenses/cc-by-nc-nd |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2011.07340 |
| primary_location.id | doi:10.1109/icassp39728.2021.9414040 |
| primary_location.is_oa | False |
| primary_location.source | |
| primary_location.license | |
| primary_location.pdf_url | |
| primary_location.version | publishedVersion |
| primary_location.raw_type | proceedings-article |
| primary_location.license_id | |
| primary_location.is_accepted | True |
| primary_location.is_published | True |
| primary_location.raw_source_name | ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) |
| primary_location.landing_page_url | https://doi.org/10.1109/icassp39728.2021.9414040 |
| publication_date | 2021-05-13 |
| publication_year | 2021 |
| referenced_works | https://openalex.org/W6751750676, https://openalex.org/W3035626590, https://openalex.org/W2964243274, https://openalex.org/W6640963894, https://openalex.org/W6639732818, https://openalex.org/W6637373629, https://openalex.org/W2015143272, https://openalex.org/W2067295501, https://openalex.org/W2516001803, https://openalex.org/W1552314771, https://openalex.org/W2625027024, https://openalex.org/W2293856338, https://openalex.org/W2963019222, https://openalex.org/W2585824449, https://openalex.org/W2064675550, https://openalex.org/W2964352155, https://openalex.org/W2963609956, https://openalex.org/W6756197946, https://openalex.org/W2972563022, https://openalex.org/W1959608418, https://openalex.org/W2962835968, https://openalex.org/W2964095416, https://openalex.org/W2962897886 |
| referenced_works_count | 23 |
| abstract_inverted_index.a | 64, 99, 106 |
| abstract_inverted_index.In | 94 |
| abstract_inverted_index.It | 72 |
| abstract_inverted_index.We | 132 |
| abstract_inverted_index.as | 78 |
| abstract_inverted_index.in | 105 |
| abstract_inverted_index.is | 9, 34, 45 |
| abstract_inverted_index.of | 39, 50, 136 |
| abstract_inverted_index.on | 139, 144 |
| abstract_inverted_index.or | 29 |
| abstract_inverted_index.to | 24, 75, 82, 121 |
| abstract_inverted_index.we | 97 |
| abstract_inverted_index.The | 109 |
| abstract_inverted_index.and | 6, 20, 42, 58, 116 |
| abstract_inverted_index.can | 73 |
| abstract_inverted_index.for | 11, 102 |
| abstract_inverted_index.our | 137 |
| abstract_inverted_index.the | 1, 4, 37, 51, 55, 69, 79, 84, 90, 123, 129, 134, 140 |
| abstract_inverted_index.two | 70 |
| abstract_inverted_index.GRID | 141 |
| abstract_inverted_index.both | 40 |
| abstract_inverted_index.data | 92 |
| abstract_inverted_index.deep | 118 |
| abstract_inverted_index.from | 16 |
| abstract_inverted_index.full | 91 |
| abstract_inverted_index.lead | 74 |
| abstract_inverted_index.many | 12 |
| abstract_inverted_index.most | 49 |
| abstract_inverted_index.only | 62 |
| abstract_inverted_index.than | 88 |
| abstract_inverted_index.that | 60 |
| abstract_inverted_index.this | 33, 95 |
| abstract_inverted_index.with | 27 |
| abstract_inverted_index.(CGI) | 19 |
| abstract_inverted_index.audio | 41 |
| abstract_inverted_index.based | 143 |
| abstract_inverted_index.given | 128 |
| abstract_inverted_index.learn | 122 |
| abstract_inverted_index.model | 80, 101, 111, 138 |
| abstract_inverted_index.since | 36 |
| abstract_inverted_index.there | 61 |
| abstract_inverted_index.video | 21 |
| abstract_inverted_index.aspect | 57 |
| abstract_inverted_index.assume | 59 |
| abstract_inverted_index.exists | 63 |
| abstract_inverted_index.ignore | 54 |
| abstract_inverted_index.models | 120 |
| abstract_inverted_index.neural | 114 |
| abstract_inverted_index.paper, | 96 |
| abstract_inverted_index.people | 26 |
| abstract_inverted_index.rather | 87 |
| abstract_inverted_index.silent | 107 |
| abstract_inverted_index.speech | 104 |
| abstract_inverted_index.video. | 108 |
| abstract_inverted_index.visual | 7, 30, 43, 130 |
| abstract_inverted_index.average | 85 |
| abstract_inverted_index.between | 3, 68 |
| abstract_inverted_index.crucial | 10 |
| abstract_inverted_index.dataset | 142 |
| abstract_inverted_index.editing | 22 |
| abstract_inverted_index.hearing | 28 |
| abstract_inverted_index.imagery | 18 |
| abstract_inverted_index.mapping | 67 |
| abstract_inverted_index.methods | 53 |
| abstract_inverted_index.present | 98 |
| abstract_inverted_index.ranging | 15 |
| abstract_inverted_index.signal. | 131 |
| abstract_inverted_index.signals | 8 |
| abstract_inverted_index.However, | 32 |
| abstract_inverted_index.auditory | 5, 124 |
| abstract_inverted_index.behavior | 86 |
| abstract_inverted_index.combines | 112 |
| abstract_inverted_index.existing | 52 |
| abstract_inverted_index.learning | 89 |
| abstract_inverted_index.modality | 44 |
| abstract_inverted_index.networks | 115 |
| abstract_inverted_index.proposed | 110 |
| abstract_inverted_index.signal's | 125 |
| abstract_inverted_index.standard | 145 |
| abstract_inverted_index.assisting | 25 |
| abstract_inverted_index.collapses | 81 |
| abstract_inverted_index.different | 13 |
| abstract_inverted_index.recurrent | 113 |
| abstract_inverted_index.Therefore, | 48 |
| abstract_inverted_index.automation | 23 |
| abstract_inverted_index.generating | 103 |
| abstract_inverted_index.generative | 119 |
| abstract_inverted_index.inherently | 46 |
| abstract_inverted_index.multimodal | 56 |
| abstract_inverted_index.one-to-one | 66 |
| abstract_inverted_index.optimizing | 83 |
| abstract_inverted_index.stochastic | 100 |
| abstract_inverted_index.benchmarks. | 146 |
| abstract_inverted_index.challenging | 35 |
| abstract_inverted_index.conditional | 126 |
| abstract_inverted_index.demonstrate | 133 |
| abstract_inverted_index.low-quality | 76 |
| abstract_inverted_index.modalities. | 71 |
| abstract_inverted_index.multimodal. | 47 |
| abstract_inverted_index.performance | 135 |
| abstract_inverted_index.predictions | 77 |
| abstract_inverted_index.variational | 117 |
| abstract_inverted_index.applications | 14 |
| abstract_inverted_index.distribution | 38, 127 |
| abstract_inverted_index.impairments. | 31 |
| abstract_inverted_index.relationship | 2 |
| abstract_inverted_index.Understanding | 0 |
| abstract_inverted_index.deterministic | 65 |
| abstract_inverted_index.distributions. | 93 |
| abstract_inverted_index.computer-generated | 17 |
| cited_by_percentile_year.max | 94 |
| cited_by_percentile_year.min | 90 |
| countries_distinct_count | 2 |
| institutions_distinct_count | 4 |
| sustainable_development_goals[0].id | https://metadata.un.org/sdg/16 |
| sustainable_development_goals[0].score | 0.6200000047683716 |
| sustainable_development_goals[0].display_name | Peace, Justice and strong institutions |
| citation_normalized_percentile.value | 0.40257165 |
| citation_normalized_percentile.is_in_top_1_percent | False |
| citation_normalized_percentile.is_in_top_10_percent | False |