Stable Diffusion
Stable Diffusion
A cat in Text-to-image Framework the snow Generator A cat in Text the snow Encoder Generation Model “中間產物” Decoder 圖片的壓縮版本 3
Framework Text-to-image Generator A cat in the snow A cat in the snow Text Encoder Generation Model Decoder 1 3 2 “中間產物” 圖片的壓縮版本
Stable Diffusion https://arxiv.org/abs/2112.10752 Latent Space 2 Conditioning Diffusion Process Semantid Map 2 Denoising U-Net EA Text x(T-1) Repres entations Images Pixel Space D可 品 ☑ T denoising step crossattention switch skip connection concat
Stable Diffusion https://arxiv.org/abs/2112.10752 1 2 3
DALL-E series https://arxiv.org/abs/2204.06125 https://arxiv.org/abs/2102.12092 CLIP objective img encoder “a corgi playing a flame text 3 a80 throwing encoder trumpet" Autoregressive Diffusion prior decoder :
DALL-E series https://arxiv.org/abs/2204.06125 1 2 3 https://arxiv.org/abs/2102.12092 Autoregressive Diffusion
Text "A Golden Retriever dog wearing a blue checkered beret and red dotted turtleneck." Imagen Frozen Text Encoder https://imagen.research.google/ https://arxiv.org/abs/2205.11487 Text Embedding Text-to-Image Diffusion Model 2 64×64 Image f2 Super-Resolution Diffusion Model 256×2561ma 3 Super-Resolution Diffusion Model 1024×10241mag9
Imagen https://imagen.research.google/ https://arxiv.org/abs/2205.11487 1 2 3