AnalysisDevelopers
7 days ago
llama.cpp PR improves MTP for Qwen 3.5
Pull request #24025 by am17an uses post-norm hidden state for faster multi-token prediction. Targeted at Qwen 3.5 models in llama.cpp.
Pull request #24025 by am17an uses post-norm hidden state for faster multi-token prediction. Targeted at Qwen 3.5 models in llama.cpp.