LaunchAI Models
Jun 10, 4:16 PM
DiffusionGemma: Google's open model generates text in parallel, up to 4x faster
Generates up to 1000 tokens/sec on single NVIDIA H100, 700+ tokens/sec on RTX 5090. The 26B MoE model (3.8B active) is released under Apache 2.0 and fits within 18GB VRAM when quantized.
·
Jun 10, 4:16 PM
