Back to AIBriefs
LaunchAI Models

DiffusionGemma: Google's open model generates text in parallel, up to 4x faster

Generates up to 1000 tokens/sec on single NVIDIA H100, 700+ tokens/sec on RTX 5090. The 26B MoE model (3.8B active) is released under Apache 2.0 and fits within 18GB VRAM when quantized.

DiffusionGemma: Google's open model generates text in parallel, up to 4x faster — AIBriefs