Modern Non-LLM AI

Image and Video Generation

How diffusion models and generative AI create visual content

What it is

Modern image generation primarily uses diffusion models, systems trained to gradually denoise random noise into coherent images conditioned on a text prompt. During training, images are progressively corrupted with noise; the model learns to reverse this process. At inference, the model starts from random noise and iteratively denoises it guided by the text prompt.

Models like DALL-E 3 and Stable Diffusion combine a text encoder (often CLIP or LLM-based) with a UNet or transformer diffusion model. Video generation extends this to temporal consistency, requiring coherent motion across frames, a significantly harder problem.

The integration trend is toward multimodal LLMs that natively generate and understand images rather than requiring separate model pipelines.

Why it matters

Image and video generation are among the most commercially visible AI capabilities and a source of significant disruption in creative industries. Understanding the technical approach (diffusion vs. GAN vs. transformer) helps you evaluate capabilities, limitations, and appropriate use cases. It's also relevant context for conversations about AI and creative labor.

Resources

Introduction to Diffusion Models for Machine Learning

assemblyai.com· Widely referenced beginner-friendly explainer with good visual diagrams. Covers forward/reverse diffusion process, the math at an intuitive level, and connections to DALL-E and Stable Diffusion. Diversifies away from Lilian Weng. **Confirmed URL.**

15 min

Diffusion Models Explained in 4-Difficulty Levels

youtube.com· AssemblyAI's multi-level explainer of diffusion models, scaling from beginner to advanced. Visual and accessible. **Confirmed.**

20 min

Diffusion Models for AI Image Generation

youtube.com· IBM Technology's explainer on how diffusion models power AI image generation. Clear, business-friendly framing. Adds creator diversity. **Confirmed.**

10 min

But How Do AI Images and Videos Actually Work?

3blue1brown.com· 3Blue1Brown's visual explanation of diffusion models for image generation. Beautiful animations making the math intuitive. Diversifies away from Weng.

20 min