AnalysisAI ModelsDevelopers
Jun 22, 8:00 PM
JetSpec parallel tree decoding speeds up Qwen3-8B by 9.64x
On MATH-500, JetSpec achieves up to 9.64x speedup with budget 256 using parallel tree drafting. It reaches ~1000 TPS on a single B200 GPU in real serving. The method trains a causal parallel draft head over fused hidden states for lossless verification.
