Back to AIBriefs
AnalysisDevelopers

llama.cpp PR uses post-norm hidden state for qwen35 MTP

PR modifies multi-token prediction (MTP) for Qwen models by using post-norm hidden state, potentially speeding up inference. The change is specific to qwen35 support in llama.cpp.

··Discuss
14 days ago
llama.cpp PR uses post-norm hidden state for qwen35 MTP — AIBriefs