llama.cpp adds EAGLE3 speculative decoding for Qwen3.5/3.6

LaunchDevelopers

Jun 13, 9:06 PM

llama.cpp adds EAGLE3 speculative decoding for Qwen3.5/3.6

EAGLE3 speculative decoding is now supported in llama.cpp via the --spec-type draft-eagle3 flag, enabling faster inference for Qwen3.5 and Qwen3.6 models. A separate PR improved MTP performance using post-norm hidden states. VLLM also released a new streaming parser in nightly to fix Qwen3+ mid-turn stopping and tool call issues.

qwen35: use post-norm hidden state for MTP by am17an · Pull Request #24025 · ggml-org/llama.cppvia r/LocalLLaMA18 days agojacek2023 Discuss

The Eagle(3) has landed (for Qwen)3 days agoLegitimate-Dog5690 Discuss

··Discuss

Jun 13, 9:06 PM