Back to AIBriefs
LaunchDevelopers

llama.cpp adds EAGLE3 speculative decoding for Qwen3.5/3.6

EAGLE3 speculative decoding is now supported in llama.cpp via the --spec-type draft-eagle3 flag, enabling faster inference for Qwen3.5 and Qwen3.6 models. A separate PR improved MTP performance using post-norm hidden states. VLLM also released a new streaming parser in nightly to fix Qwen3+ mid-turn stopping and tool call issues.

llama.cpp adds EAGLE3 speculative decoding for Qwen3.5/3.6 — AIBriefs