Back to AIBriefs
LaunchAI Models

DiffusionGemma: 4x faster text generation

Google's DiffusionGemma generates 256 tokens simultaneously using text diffusion, achieving up to 4x faster inference than autoregressive models. It runs 1000+ tokens/s on an H100 and fits within 18GB VRAM when quantized. The 26B MoE model activates only 3.8B parameters per forward pass.

DiffusionGemma: 4x faster text generation — AIBriefs