Boost Inference Performance up to 15x on NVIDIA Blackwell Using DFlash Speculative…

LaunchAI Models

Jun 23, 3:00 PM

Boost Inference Performance up to 15x on NVIDIA Blackwell Using DFlash Speculative…

DFlash, an open-source block diffusion drafter, boosts inference for gpt-oss-120b on NVIDIA Blackwell by up to 15x. It nearly doubles interactivity for Llama 3.1 8B vs EAGLE-3, with 20 checkpoints available on Hugging Face.

Jun 23, 3:00 PM