doi.org
Speech Prediction in Silent Videos Using Variational Autoencoders
May 2021 • Ravindra Yadav, Ashish Sardana, Vinay P. Namboodiri, Rajesh M. Hegde
Understanding the relationship between the auditory and visual signals is crucial for many different applications ranging from computer-generated imagery (CGI) and video editing automation to assisting people with hearing or visual impairments. However, this is challenging since the distribution of both audio and visual modality is inherently multimodal. Therefore, most of the existing methods ignore the multimodal aspect and assume that there only exists a deterministic one-to-one mapping between the two modaliti…