JetSpec parallel tree decoding speeds up Qwen3-8B by 9.64x

AnalysisAI ModelsDevelopers

Jun 22, 8:00 PM

JetSpec parallel tree decoding speeds up Qwen3-8B by 9.64x

On MATH-500, JetSpec achieves up to 9.64x speedup with budget 256 using parallel tree drafting. It reaches ~1000 TPS on a single B200 GPU in real serving. The method trains a causal parallel draft head over fused hidden states for lossless verification.

JetSpec5 days ago

Jun 22, 8:00 PM