Back to AIBriefs
AnalysisDevelopers

llama.cpp PR improves MTP for Qwen 3.5

Pull request #24025 by am17an uses post-norm hidden state for faster multi-token prediction. Targeted at Qwen 3.5 models in llama.cpp.

··Discuss
7 days ago
llama.cpp PR improves MTP for Qwen 3.5 — AIBriefs