LaunchAI Models
Jun 10, 4:24 PM
DiffusionGemma: 4x faster text generation
Google's DiffusionGemma generates 256 tokens simultaneously using text diffusion, achieving up to 4x faster inference than autoregressive models. It runs 1000+ tokens/s on an H100 and fits within 18GB VRAM when quantized. The 26B MoE model activates only 3.8B parameters per forward pass.
Jun 10, 4:24 PM
