Back to AIBriefs
LaunchAI Models

DiffusionGemma: 4x faster text generation

Google's DiffusionGemma generates 256 tokens simultaneously using diffusion, achieving up to 4x faster inference than autoregressive models. The 26B MoE model (3.8B active) is open-source under Apache 2.0, targeting speed-critical local AI workflows. It runs on consumer GPUs with 18GB VRAM when quantized, delivering 1000+ tokens/s on an H100.

DiffusionGemma: 4x faster text generation — AIBriefs