Back to AIBriefs
AnalysisDevelopers

llama.cpp PR speeds up qwen35 MTP with post-norm hidden state

A pull request to llama.cpp introduces post-norm hidden state for Multi-Token Prediction (MTP) in qwen35 models. It aims to improve MTP inference speed.

··Discuss
Jun 3, 5:34 PM
llama.cpp PR speeds up qwen35 MTP with post-norm hidden state — AIBriefs