AnalysisDevelopers
Jun 3, 5:34 PM
llama.cpp PR speeds up qwen35 MTP with post-norm hidden state
A pull request to llama.cpp introduces post-norm hidden state for Multi-Token Prediction (MTP) in qwen35 models. It aims to improve MTP inference speed.
A pull request to llama.cpp introduces post-norm hidden state for Multi-Token Prediction (MTP) in qwen35 models. It aims to improve MTP inference speed.