AI Glossary · architecture

What are Diffusion Models?

Definition
Diffusion models are a class of generative AI that learns to reverse a noising process — they start with pure random noise and iteratively remove noise to produce a coherent image. Every leading image generator — Stable Diffusion, DALL-E 3, Imagen, FLUX, Midjourney — is a diffusion model, though each varies in architecture and training data.

Diffusion models dominated generative image AI starting in 2022 because they produce higher-quality and more diverse outputs than the GANs (Generative Adversarial Networks) that preceded them. The key insight: instead of teaching one network to go directly from noise to image (which GANs do), train a network to remove a small amount of noise — a much easier task — then iterate 20–50 times to turn noise into a finished image.

Variants include DDPM (the original formulation), DDIM (faster deterministic sampling), and rectified flow matching (used in Stable Diffusion 3 and FLUX for further quality gains with fewer steps).

How it works

Forward process (training)

During training, each image has noise gradually added over 1000 steps until it becomes pure Gaussian noise. The network learns to predict the noise at each step so it can be subtracted.

Reverse process (generation)

At inference, start from pure noise. Run the network 20–50 times, each time subtracting the predicted noise. After enough iterations, a coherent image emerges.

Text conditioning

Text prompts are converted to embeddings (via CLIP or a similar encoder) and injected into each denoising step, steering the output toward an image matching the prompt's semantics.

Common use cases

  • Text-to-image generation (Stable Diffusion, DALL-E, Imagen, FLUX)
  • Text-to-video generation (Sora, Veo, Kling — all diffusion transformers)
  • Image editing via latent-space manipulation (inpainting, outpainting)
  • Super-resolution and upscaling
  • Text-to-audio and music generation (Stable Audio)

Frequently asked questions

Why did diffusion models replace GANs?
Diffusion models produce more diverse outputs (GANs often suffer mode collapse), train more stably, and scale better with more compute and data. Quality has consistently exceeded GAN-generated images since 2022.
Are all AI image generators diffusion models?
Nearly all leading 2024+ image generators are. The remaining non-diffusion approaches (auto-regressive transformers, GANs) are mostly research-stage or used for specialized cases.

Further reading

Related terms

What are Diffusion Models? AI Image Generation Architecture | Oakgen | Oakgen.ai