LaunchAI Models
8 days ago
DiffusionGemma: 4x faster text generation
Google's DiffusionGemma generates 256 tokens simultaneously using diffusion, achieving up to 4x faster inference than autoregressive models. The 26B MoE model (3.8B active) is open-source under Apache 2.0, targeting speed-critical local AI workflows. It runs on consumer GPUs with 18GB VRAM when quantized, delivering 1000+ tokens/s on an H100.
·
8 days ago
