Speech Prediction in Silent Videos Using Variational Autoencoders Article Swipe

View

Ravindra Yadav , Ashish Sardana , Vinay P. Namboodiri , Rajesh M. Hegde ·

YOU? · · 2021 · Open Access · · DOI: https://doi.org/10.1109/icassp39728.2021.9414040

Understanding the relationship between the auditory and visual signals is crucial for many different applications ranging from computer-generated imagery (CGI) and video editing automation to assisting people with hearing or visual impairments. However, this is challenging since the distribution of both audio and visual modality is inherently multimodal. Therefore, most of the existing methods ignore the multimodal aspect and assume that there only exists a deterministic one-to-one mapping between the two modalities. It can lead to low-quality predictions as the model collapses to optimizing the average behavior rather than learning the full data distributions. In this paper, we present a stochastic model for generating speech in a silent video. The proposed model combines recurrent neural networks and variational deep generative models to learn the auditory signal's conditional distribution given the visual signal. We demonstrate the performance of our model on the GRID dataset based on standard benchmarks.

Related Topics

Computer Science

Artificial Intelligence

Free-Ranging Dog

Visualization (Graphics)

Concepts

Computer science Speech recognition Modalities Modality (human–computer interaction) Artificial intelligence Ranging Generative model Visualization SIGNAL (programming language) Grid Artificial neural network Generative grammar Machine learning Telecommunications Programming language Social science Mathematics Geometry Sociology

Metadata

Type: preprint
Language: en
Landing Page: https://doi.org/10.1109/icassp39728.2021.9414040
OA Status: green
Cited By: 1
References: 23
Related Works: 20
OpenAlex ID: https://openalex.org/W3103085143

All OpenAlex metadata

Raw OpenAlex JSON

OpenAlex ID: https://openalex.org/W3103085143

Canonical identifier for this work in OpenAlex
DOI: https://doi.org/10.1109/icassp39728.2021.9414040

Digital Object Identifier
Title: Speech Prediction in Silent Videos Using Variational Autoencoders

Work title
Type: preprint

OpenAlex work type
Language: en

Primary language
Publication year: 2021

Year of publication
Publication date: 2021-05-13

Full publication date if available
Authors: Ravindra Yadav, Ashish Sardana, Vinay P. Namboodiri, Rajesh M. Hegde

List of authors in order
Landing page: https://doi.org/10.1109/icassp39728.2021.9414040

Publisher landing page
Open access: Yes

Whether a free full text is available
OA status: green

Open access status per OpenAlex
OA URL: https://arxiv.org/pdf/2011.07340

Direct OA link when available
Concepts: Computer science, Speech recognition, Modalities, Modality (human–computer interaction), Artificial intelligence, Ranging, Generative model, Visualization, SIGNAL (programming language), Grid, Artificial neural network, Generative grammar, Machine learning, Telecommunications, Programming language, Social science, Mathematics, Geometry, Sociology

Top concepts (fields/topics) attached by OpenAlex
Cited by: 1

Total citation count in OpenAlex
Citations by year (recent): 2024: 1

Per-year citation counts (last 5 years)
References (count): 23

Number of works referenced by this work
Related works (count): 20

Other works algorithmically related by OpenAlex

Full payload

id	https://openalex.org/W3103085143
doi	https://doi.org/10.1109/icassp39728.2021.9414040
ids.doi	https://doi.org/10.1109/icassp39728.2021.9414040
ids.mag	3103085143
ids.openalex	https://openalex.org/W3103085143
fwci	0.14397953
type	preprint
title	Speech Prediction in Silent Videos Using Variational Autoencoders
biblio.issue
biblio.volume
biblio.last_page	7052
biblio.first_page	7048
topics[0].id	https://openalex.org/T10860
topics[0].field.id	https://openalex.org/fields/17
topics[0].field.display_name	Computer Science
topics[0].score	1.0
topics[0].domain.id	https://openalex.org/domains/3
topics[0].domain.display_name	Physical Sciences
topics[0].subfield.id	https://openalex.org/subfields/1711
topics[0].subfield.display_name	Signal Processing
topics[0].display_name	Speech and Audio Processing
topics[1].id	https://openalex.org/T11309
topics[1].field.id	https://openalex.org/fields/17
topics[1].field.display_name	Computer Science
topics[1].score	0.9993000030517578
topics[1].domain.id	https://openalex.org/domains/3
topics[1].domain.display_name	Physical Sciences
topics[1].subfield.id	https://openalex.org/subfields/1711
topics[1].subfield.display_name	Signal Processing
topics[1].display_name	Music and Audio Processing
topics[2].id	https://openalex.org/T11439
topics[2].field.id	https://openalex.org/fields/17
topics[2].field.display_name	Computer Science
topics[2].score	0.9933000206947327
topics[2].domain.id	https://openalex.org/domains/3
topics[2].domain.display_name	Physical Sciences
topics[2].subfield.id	https://openalex.org/subfields/1707
topics[2].subfield.display_name	Computer Vision and Pattern Recognition
topics[2].display_name	Video Analysis and Summarization
is_xpac	False
apc_list
apc_paid
concepts[0].id	https://openalex.org/C41008148
concepts[0].level	0
concepts[0].score	0.7863545417785645
concepts[0].wikidata	https://www.wikidata.org/wiki/Q21198
concepts[0].display_name	Computer science
concepts[1].id	https://openalex.org/C28490314
concepts[1].level	1
concepts[1].score	0.55987948179245
concepts[1].wikidata	https://www.wikidata.org/wiki/Q189436
concepts[1].display_name	Speech recognition
concepts[2].id	https://openalex.org/C2779903281
concepts[2].level	2
concepts[2].score	0.5275102257728577
concepts[2].wikidata	https://www.wikidata.org/wiki/Q6888026
concepts[2].display_name	Modalities
concepts[3].id	https://openalex.org/C2780226545
concepts[3].level	2
concepts[3].score	0.5226906538009644
concepts[3].wikidata	https://www.wikidata.org/wiki/Q6888030
concepts[3].display_name	Modality (human–computer interaction)
concepts[4].id	https://openalex.org/C154945302
concepts[4].level	1
concepts[4].score	0.5217869281768799
concepts[4].wikidata	https://www.wikidata.org/wiki/Q11660
concepts[4].display_name	Artificial intelligence
concepts[5].id	https://openalex.org/C115051666
concepts[5].level	2
concepts[5].score	0.5129973292350769
concepts[5].wikidata	https://www.wikidata.org/wiki/Q6522493
concepts[5].display_name	Ranging
concepts[6].id	https://openalex.org/C167966045
concepts[6].level	3
concepts[6].score	0.4808812439441681
concepts[6].wikidata	https://www.wikidata.org/wiki/Q5532625
concepts[6].display_name	Generative model
concepts[7].id	https://openalex.org/C36464697
concepts[7].level	2
concepts[7].score	0.4383709728717804
concepts[7].wikidata	https://www.wikidata.org/wiki/Q451553
concepts[7].display_name	Visualization
concepts[8].id	https://openalex.org/C2779843651
concepts[8].level	2
concepts[8].score	0.42874160408973694
concepts[8].wikidata	https://www.wikidata.org/wiki/Q7390335
concepts[8].display_name	SIGNAL (programming language)
concepts[9].id	https://openalex.org/C187691185
concepts[9].level	2
concepts[9].score	0.42590227723121643
concepts[9].wikidata	https://www.wikidata.org/wiki/Q2020720
concepts[9].display_name	Grid
concepts[10].id	https://openalex.org/C50644808
concepts[10].level	2
concepts[10].score	0.4198591113090515
concepts[10].wikidata	https://www.wikidata.org/wiki/Q192776
concepts[10].display_name	Artificial neural network
concepts[11].id	https://openalex.org/C39890363
concepts[11].level	2
concepts[11].score	0.4080892503261566
concepts[11].wikidata	https://www.wikidata.org/wiki/Q36108
concepts[11].display_name	Generative grammar
concepts[12].id	https://openalex.org/C119857082
concepts[12].level	1
concepts[12].score	0.3395358622074127
concepts[12].wikidata	https://www.wikidata.org/wiki/Q2539
concepts[12].display_name	Machine learning
concepts[13].id	https://openalex.org/C76155785
concepts[13].level	1
concepts[13].score	0.0
concepts[13].wikidata	https://www.wikidata.org/wiki/Q418
concepts[13].display_name	Telecommunications
concepts[14].id	https://openalex.org/C199360897
concepts[14].level	1
concepts[14].score	0.0
concepts[14].wikidata	https://www.wikidata.org/wiki/Q9143
concepts[14].display_name	Programming language
concepts[15].id	https://openalex.org/C36289849
concepts[15].level	1
concepts[15].score	0.0
concepts[15].wikidata	https://www.wikidata.org/wiki/Q34749
concepts[15].display_name	Social science
concepts[16].id	https://openalex.org/C33923547
concepts[16].level	0
concepts[16].score	0.0
concepts[16].wikidata	https://www.wikidata.org/wiki/Q395
concepts[16].display_name	Mathematics
concepts[17].id	https://openalex.org/C2524010
concepts[17].level	1
concepts[17].score	0.0
concepts[17].wikidata	https://www.wikidata.org/wiki/Q8087
concepts[17].display_name	Geometry
concepts[18].id	https://openalex.org/C144024400
concepts[18].level	0
concepts[18].score	0.0
concepts[18].wikidata	https://www.wikidata.org/wiki/Q21201
concepts[18].display_name	Sociology
keywords[0].id	https://openalex.org/keywords/computer-science
keywords[0].score	0.7863545417785645
keywords[0].display_name	Computer science
keywords[1].id	https://openalex.org/keywords/speech-recognition
keywords[1].score	0.55987948179245
keywords[1].display_name	Speech recognition
keywords[2].id	https://openalex.org/keywords/modalities
keywords[2].score	0.5275102257728577
keywords[2].display_name	Modalities
keywords[3].id	https://openalex.org/keywords/modality
keywords[3].score	0.5226906538009644
keywords[3].display_name	Modality (human–computer interaction)
keywords[4].id	https://openalex.org/keywords/artificial-intelligence
keywords[4].score	0.5217869281768799
keywords[4].display_name	Artificial intelligence
keywords[5].id	https://openalex.org/keywords/ranging
keywords[5].score	0.5129973292350769
keywords[5].display_name	Ranging
keywords[6].id	https://openalex.org/keywords/generative-model
keywords[6].score	0.4808812439441681
keywords[6].display_name	Generative model
keywords[7].id	https://openalex.org/keywords/visualization
keywords[7].score	0.4383709728717804
keywords[7].display_name	Visualization
keywords[8].id	https://openalex.org/keywords/signal
keywords[8].score	0.42874160408973694
keywords[8].display_name	SIGNAL (programming language)
keywords[9].id	https://openalex.org/keywords/grid
keywords[9].score	0.42590227723121643
keywords[9].display_name	Grid
keywords[10].id	https://openalex.org/keywords/artificial-neural-network
keywords[10].score	0.4198591113090515
keywords[10].display_name	Artificial neural network
keywords[11].id	https://openalex.org/keywords/generative-grammar
keywords[11].score	0.4080892503261566
keywords[11].display_name	Generative grammar
keywords[12].id	https://openalex.org/keywords/machine-learning
keywords[12].score	0.3395358622074127
keywords[12].display_name	Machine learning
language	en
locations[0].id	doi:10.1109/icassp39728.2021.9414040
locations[0].is_oa	False
locations[0].source
locations[0].license
locations[0].pdf_url
locations[0].version	publishedVersion
locations[0].raw_type	proceedings-article
locations[0].license_id
locations[0].is_accepted	True
locations[0].is_published	True
locations[0].raw_source_name	ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
locations[0].landing_page_url	https://doi.org/10.1109/icassp39728.2021.9414040
locations[1].id	pmh:oai:arXiv.org:2011.07340
locations[1].is_oa	True
locations[1].source.id	https://openalex.org/S4306400194
locations[1].source.issn
locations[1].source.type	repository
locations[1].source.is_oa	True
locations[1].source.issn_l
locations[1].source.is_core	False
locations[1].source.is_in_doaj	False
locations[1].source.display_name	arXiv (Cornell University)
locations[1].source.host_organization	https://openalex.org/I205783295
locations[1].source.host_organization_name	Cornell University
locations[1].source.host_organization_lineage	https://openalex.org/I205783295
locations[1].license	cc-by-nc-nd
locations[1].pdf_url	https://arxiv.org/pdf/2011.07340
locations[1].version	submittedVersion
locations[1].raw_type	text
locations[1].license_id	https://openalex.org/licenses/cc-by-nc-nd
locations[1].is_accepted	False
locations[1].is_published	False
locations[1].raw_source_name
locations[1].landing_page_url	http://arxiv.org/abs/2011.07340
locations[2].id	mag:3103085143
locations[2].is_oa	True
locations[2].source.id	https://openalex.org/S4306400194
locations[2].source.issn
locations[2].source.type	repository
locations[2].source.is_oa	True
locations[2].source.issn_l
locations[2].source.is_core	False
locations[2].source.is_in_doaj	False
locations[2].source.display_name	arXiv (Cornell University)
locations[2].source.host_organization	https://openalex.org/I205783295
locations[2].source.host_organization_name	Cornell University
locations[2].source.host_organization_lineage	https://openalex.org/I205783295
locations[2].license
locations[2].pdf_url
locations[2].version	submittedVersion
locations[2].raw_type
locations[2].license_id
locations[2].is_accepted	False
locations[2].is_published	False
locations[2].raw_source_name	arXiv (Cornell University)
locations[2].landing_page_url	https://arxiv.org/pdf/2011.07340.pdf
locations[3].id	doi:10.48550/arxiv.2011.07340
locations[3].is_oa	True
locations[3].source.id	https://openalex.org/S4306400194
locations[3].source.issn
locations[3].source.type	repository
locations[3].source.is_oa	True
locations[3].source.issn_l
locations[3].source.is_core	False
locations[3].source.is_in_doaj	False
locations[3].source.display_name	arXiv (Cornell University)
locations[3].source.host_organization	https://openalex.org/I205783295
locations[3].source.host_organization_name	Cornell University
locations[3].source.host_organization_lineage	https://openalex.org/I205783295
locations[3].license
locations[3].pdf_url
locations[3].version
locations[3].raw_type	article
locations[3].license_id
locations[3].is_accepted	False
locations[3].is_published
locations[3].raw_source_name
locations[3].landing_page_url	https://doi.org/10.48550/arxiv.2011.07340
locations[4].id	doi:10.17023/x8zh-0b82
locations[4].is_oa	True
locations[4].source.id	https://openalex.org/S7407051697
locations[4].source.type	repository
locations[4].source.is_oa	False
locations[4].source.issn_l
locations[4].source.is_core	False
locations[4].source.is_in_doaj	False
locations[4].source.display_name	IEEE RESOURCE CENTERS
locations[4].source.host_organization
locations[4].source.host_organization_name
locations[4].license
locations[4].pdf_url
locations[4].version
locations[4].raw_type	article
locations[4].license_id
locations[4].is_accepted	False
locations[4].is_published
locations[4].raw_source_name
locations[4].landing_page_url	https://doi.org/10.17023/x8zh-0b82
indexed_in	arxiv, crossref, datacite
authorships[0].author.id	https://openalex.org/A5010648323
authorships[0].author.orcid	https://orcid.org/0000-0003-4628-0688
authorships[0].author.display_name	Ravindra Yadav
authorships[0].countries	IN
authorships[0].affiliations[0].institution_ids	https://openalex.org/I94234084
authorships[0].affiliations[0].raw_affiliation_string	Indian Institute of Technology - Kanpur, India#TAB#
authorships[0].institutions[0].id	https://openalex.org/I94234084
authorships[0].institutions[0].ror	https://ror.org/05pjsgx75
authorships[0].institutions[0].type	education
authorships[0].institutions[0].lineage	https://openalex.org/I94234084
authorships[0].institutions[0].country_code	IN
authorships[0].institutions[0].display_name	Indian Institute of Technology Kanpur
authorships[0].author_position	first
authorships[0].raw_author_name	Ravindra Yadav
authorships[0].is_corresponding	False
authorships[0].raw_affiliation_strings	Indian Institute of Technology - Kanpur, India#TAB#
authorships[1].author.id	https://openalex.org/A5083439972
authorships[1].author.orcid
authorships[1].author.display_name	Ashish Sardana
authorships[1].countries	GB
authorships[1].affiliations[0].institution_ids	https://openalex.org/I1304085615
authorships[1].affiliations[0].raw_affiliation_string	nVidia
authorships[1].institutions[0].id	https://openalex.org/I1304085615
authorships[1].institutions[0].ror	https://ror.org/02kr42612
authorships[1].institutions[0].type	company
authorships[1].institutions[0].lineage	https://openalex.org/I1304085615, https://openalex.org/I4210127875
authorships[1].institutions[0].country_code	GB
authorships[1].institutions[0].display_name	Nvidia (United Kingdom)
authorships[1].author_position	middle
authorships[1].raw_author_name	Ashish Sardana
authorships[1].is_corresponding	False
authorships[1].raw_affiliation_strings	nVidia
authorships[2].author.id	https://openalex.org/A5007109424
authorships[2].author.orcid	https://orcid.org/0000-0001-5262-9722
authorships[2].author.display_name	Vinay P. Namboodiri
authorships[2].countries	GB
authorships[2].affiliations[0].institution_ids	https://openalex.org/I51601045
authorships[2].affiliations[0].raw_affiliation_string	#N# University of Bath, UK#N#
authorships[2].institutions[0].id	https://openalex.org/I51601045
authorships[2].institutions[0].ror	https://ror.org/002h8g185
authorships[2].institutions[0].type	education
authorships[2].institutions[0].lineage	https://openalex.org/I51601045
authorships[2].institutions[0].country_code	GB
authorships[2].institutions[0].display_name	University of Bath
authorships[2].author_position	middle
authorships[2].raw_author_name	Vinay P Namboodiri
authorships[2].is_corresponding	False
authorships[2].raw_affiliation_strings	#N# University of Bath, UK#N#
authorships[3].author.id	https://openalex.org/A5085503354
authorships[3].author.orcid	https://orcid.org/0000-0002-6142-7724
authorships[3].author.display_name	Rajesh M. Hegde
authorships[3].countries	IN
authorships[3].affiliations[0].institution_ids	https://openalex.org/I94234084
authorships[3].affiliations[0].raw_affiliation_string	Indian Institute of Technology - Kanpur, India#TAB#
authorships[3].institutions[0].id	https://openalex.org/I94234084
authorships[3].institutions[0].ror	https://ror.org/05pjsgx75
authorships[3].institutions[0].type	education
authorships[3].institutions[0].lineage	https://openalex.org/I94234084
authorships[3].institutions[0].country_code	IN
authorships[3].institutions[0].display_name	Indian Institute of Technology Kanpur
authorships[3].author_position	last
authorships[3].raw_author_name	Rajesh M Hegde
authorships[3].is_corresponding	False
authorships[3].raw_affiliation_strings	Indian Institute of Technology - Kanpur, India#TAB#
has_content.pdf	True
has_content.grobid_xml	True
is_paratext	False
open_access.is_oa	True
open_access.oa_url	https://arxiv.org/pdf/2011.07340
open_access.oa_status	green
open_access.any_repository_has_fulltext	False
created_date	2025-10-10T00:00:00
display_name	Speech Prediction in Silent Videos Using Variational Autoencoders
has_fulltext	False
is_retracted	False
updated_date	2025-11-06T03:46:38.306776
primary_topic.id	https://openalex.org/T10860
primary_topic.field.id	https://openalex.org/fields/17
primary_topic.field.display_name	Computer Science
primary_topic.score	1.0
primary_topic.domain.id	https://openalex.org/domains/3
primary_topic.domain.display_name	Physical Sciences
primary_topic.subfield.id	https://openalex.org/subfields/1711
primary_topic.subfield.display_name	Signal Processing
primary_topic.display_name	Speech and Audio Processing
related_works	https://openalex.org/W3045032902, https://openalex.org/W3187009280, https://openalex.org/W3098252333, https://openalex.org/W2611160234, https://openalex.org/W1480583224, https://openalex.org/W3209013111, https://openalex.org/W2990467045, https://openalex.org/W2056961398, https://openalex.org/W3141688548, https://openalex.org/W3015925607, https://openalex.org/W2995255435, https://openalex.org/W2613448434, https://openalex.org/W3036496243, https://openalex.org/W2461011248, https://openalex.org/W2982076115, https://openalex.org/W2186282052, https://openalex.org/W2963917086, https://openalex.org/W3093287838, https://openalex.org/W2946520073, https://openalex.org/W2594156432
cited_by_count	1
counts_by_year[0].year	2024
counts_by_year[0].cited_by_count	1
locations_count	5
best_oa_location.id	pmh:oai:arXiv.org:2011.07340
best_oa_location.is_oa	True
best_oa_location.source.id	https://openalex.org/S4306400194
best_oa_location.source.issn
best_oa_location.source.type	repository
best_oa_location.source.is_oa	True
best_oa_location.source.issn_l
best_oa_location.source.is_core	False
best_oa_location.source.is_in_doaj	False
best_oa_location.source.display_name	arXiv (Cornell University)
best_oa_location.source.host_organization	https://openalex.org/I205783295
best_oa_location.source.host_organization_name	Cornell University
best_oa_location.source.host_organization_lineage	https://openalex.org/I205783295
best_oa_location.license	cc-by-nc-nd
best_oa_location.pdf_url	https://arxiv.org/pdf/2011.07340
best_oa_location.version	submittedVersion
best_oa_location.raw_type	text
best_oa_location.license_id	https://openalex.org/licenses/cc-by-nc-nd
best_oa_location.is_accepted	False
best_oa_location.is_published	False
best_oa_location.raw_source_name
best_oa_location.landing_page_url	http://arxiv.org/abs/2011.07340
primary_location.id	doi:10.1109/icassp39728.2021.9414040
primary_location.is_oa	False
primary_location.source
primary_location.license
primary_location.pdf_url
primary_location.version	publishedVersion
primary_location.raw_type	proceedings-article
primary_location.license_id
primary_location.is_accepted	True
primary_location.is_published	True
primary_location.raw_source_name	ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
primary_location.landing_page_url	https://doi.org/10.1109/icassp39728.2021.9414040
publication_date	2021-05-13
publication_year	2021
referenced_works	https://openalex.org/W6751750676, https://openalex.org/W3035626590, https://openalex.org/W2964243274, https://openalex.org/W6640963894, https://openalex.org/W6639732818, https://openalex.org/W6637373629, https://openalex.org/W2015143272, https://openalex.org/W2067295501, https://openalex.org/W2516001803, https://openalex.org/W1552314771, https://openalex.org/W2625027024, https://openalex.org/W2293856338, https://openalex.org/W2963019222, https://openalex.org/W2585824449, https://openalex.org/W2064675550, https://openalex.org/W2964352155, https://openalex.org/W2963609956, https://openalex.org/W6756197946, https://openalex.org/W2972563022, https://openalex.org/W1959608418, https://openalex.org/W2962835968, https://openalex.org/W2964095416, https://openalex.org/W2962897886
referenced_works_count	23
abstract_inverted_index.a	64, 99, 106
abstract_inverted_index.In	94
abstract_inverted_index.It	72
abstract_inverted_index.We	132
abstract_inverted_index.as	78
abstract_inverted_index.in	105
abstract_inverted_index.is	9, 34, 45
abstract_inverted_index.of	39, 50, 136
abstract_inverted_index.on	139, 144
abstract_inverted_index.or	29
abstract_inverted_index.to	24, 75, 82, 121
abstract_inverted_index.we	97
abstract_inverted_index.The	109
abstract_inverted_index.and	6, 20, 42, 58, 116
abstract_inverted_index.can	73
abstract_inverted_index.for	11, 102
abstract_inverted_index.our	137
abstract_inverted_index.the	1, 4, 37, 51, 55, 69, 79, 84, 90, 123, 129, 134, 140
abstract_inverted_index.two	70
abstract_inverted_index.GRID	141
abstract_inverted_index.both	40
abstract_inverted_index.data	92
abstract_inverted_index.deep	118
abstract_inverted_index.from	16
abstract_inverted_index.full	91
abstract_inverted_index.lead	74
abstract_inverted_index.many	12
abstract_inverted_index.most	49
abstract_inverted_index.only	62
abstract_inverted_index.than	88
abstract_inverted_index.that	60
abstract_inverted_index.this	33, 95
abstract_inverted_index.with	27
abstract_inverted_index.(CGI)	19
abstract_inverted_index.audio	41
abstract_inverted_index.based	143
abstract_inverted_index.given	128
abstract_inverted_index.learn	122
abstract_inverted_index.model	80, 101, 111, 138
abstract_inverted_index.since	36
abstract_inverted_index.there	61
abstract_inverted_index.video	21
abstract_inverted_index.aspect	57
abstract_inverted_index.assume	59
abstract_inverted_index.exists	63
abstract_inverted_index.ignore	54
abstract_inverted_index.models	120
abstract_inverted_index.neural	114
abstract_inverted_index.paper,	96
abstract_inverted_index.people	26
abstract_inverted_index.rather	87
abstract_inverted_index.silent	107
abstract_inverted_index.speech	104
abstract_inverted_index.video.	108
abstract_inverted_index.visual	7, 30, 43, 130
abstract_inverted_index.average	85
abstract_inverted_index.between	3, 68
abstract_inverted_index.crucial	10
abstract_inverted_index.dataset	142
abstract_inverted_index.editing	22
abstract_inverted_index.hearing	28
abstract_inverted_index.imagery	18
abstract_inverted_index.mapping	67
abstract_inverted_index.methods	53
abstract_inverted_index.present	98
abstract_inverted_index.ranging	15
abstract_inverted_index.signal.	131
abstract_inverted_index.signals	8
abstract_inverted_index.However,	32
abstract_inverted_index.auditory	5, 124
abstract_inverted_index.behavior	86
abstract_inverted_index.combines	112
abstract_inverted_index.existing	52
abstract_inverted_index.learning	89
abstract_inverted_index.modality	44
abstract_inverted_index.networks	115
abstract_inverted_index.proposed	110
abstract_inverted_index.signal's	125
abstract_inverted_index.standard	145
abstract_inverted_index.assisting	25
abstract_inverted_index.collapses	81
abstract_inverted_index.different	13
abstract_inverted_index.recurrent	113
abstract_inverted_index.Therefore,	48
abstract_inverted_index.automation	23
abstract_inverted_index.generating	103
abstract_inverted_index.generative	119
abstract_inverted_index.inherently	46
abstract_inverted_index.multimodal	56
abstract_inverted_index.one-to-one	66
abstract_inverted_index.optimizing	83
abstract_inverted_index.stochastic	100
abstract_inverted_index.benchmarks.	146
abstract_inverted_index.challenging	35
abstract_inverted_index.conditional	126
abstract_inverted_index.demonstrate	133
abstract_inverted_index.low-quality	76
abstract_inverted_index.modalities.	71
abstract_inverted_index.multimodal.	47
abstract_inverted_index.performance	135
abstract_inverted_index.predictions	77
abstract_inverted_index.variational	117
abstract_inverted_index.applications	14
abstract_inverted_index.distribution	38, 127
abstract_inverted_index.impairments.	31
abstract_inverted_index.relationship	2
abstract_inverted_index.Understanding	0
abstract_inverted_index.deterministic	65
abstract_inverted_index.distributions.	93
abstract_inverted_index.computer-generated	17
cited_by_percentile_year.max	94
cited_by_percentile_year.min	90
countries_distinct_count	2
institutions_distinct_count	4
sustainable_development_goals[0].id	https://metadata.un.org/sdg/16
sustainable_development_goals[0].score	0.6200000047683716
sustainable_development_goals[0].display_name	Peace, Justice and strong institutions
citation_normalized_percentile.value	0.40257165
citation_normalized_percentile.is_in_top_1_percent	False
citation_normalized_percentile.is_in_top_10_percent	False