LCGD: Enhancing Text-to-Video Generation via Contextual LLM Guidance and U-Net Denoising

Exploring foci of: IEEE Access • Vol 13 LCGD: Enhancing Text-to-Video Generation via Contextual LLM Guidance and U-Net Denoising January 2025 • Muhammad Waseem, Muhammad Usman Ghani Khan, Syed Khaldoon Khurshid Diffusion models have emerged as a leading solution in computer vision and they excel at audio, image, and video generation by utilizing the Markov chain to map complex latent spaces. These models outperform other generative models such as GANs and VAEs, with their noising and denoising processes modeled after U-Net architecture, enabling high-quality text-to-image and text-to-video synthesis. However, existing research has largely focused on application rather than improving underlying architectures, leading to l… Open Article Page

Computer Science Artificial Intelligence Open Article