AnalysisAI Models
Jun 24, 7:21 AM
DFlash speculative decoding enables 15x throughput on NVIDIA Blackwell
DFlash drafts multiple token blocks simultaneously rather than one at a time. Achieves up to 15x higher throughput on NVIDIA Blackwell GPUs. The method targets latency issues in autoregressive LLMs, especially Chain-of-Thought reasoning models.
·
Jun 24, 7:21 AM
