JetSpec achieves 9.64x speedup on MATH-500 with parallel tree drafting

AnalysisAI Models

Jun 22, 8:00 PM

JetSpec achieves 9.64x speedup on MATH-500 with parallel tree drafting

JetSpec co-optimizes drafting cost and quality by using a causal parallel draft head that scores candidate trees according to the target model's autoregressive factorization. On Qwen3-8B with budget 256, it reaches up to 9.64x on MATH-500 and 4.58x on open-ended chat, with ~1000 TPS throughput on a single B200 GPU.

Jun 22, 8:00 PM