Exploring foci of:
IEEE Access • Vol 13
LCGD: Enhancing Text-to-Video Generation via Contextual LLM Guidance and U-Net Denoising
January 2025 • Muhammad Waseem, Muhammad Usman Ghani Khan, Syed Khaldoon Khurshid
Diffusion models have emerged as a leading solution in computer vision and they excel at audio, image, and video generation by utilizing the Markov chain to map complex latent spaces. These models outperform other generative models such as GANs and VAEs, with their noising and denoising processes modeled after U-Net architecture, enabling high-quality text-to-image and text-to-video synthesis. However, existing research has largely focused on application rather than improving underlying architectures, leading to l…
Computer Science
Artificial Intelligence