AnalysisDevelopers
14 days ago
llama.cpp PR uses post-norm hidden state for qwen35 MTP
PR modifies multi-token prediction (MTP) for Qwen models by using post-norm hidden state, potentially speeding up inference. The change is specific to qwen35 support in llama.cpp.
