AnalysisAI Models
Jun 22, 8:00 PM
JetSpec achieves 9.64x speedup on MATH-500 with parallel tree drafting
JetSpec co-optimizes drafting cost and quality by using a causal parallel draft head that scores candidate trees according to the target model's autoregressive factorization. On Qwen3-8B with budget 256, it reaches up to 9.64x on MATH-500 and 4.58x on open-ended chat, with ~1000 TPS throughput on a single B200 GPU.
Jun 22, 8:00 PM
