User reports worse quality with MTP on Qwen 3.6 and Gemma 4

AnalysisAI Models

Jun 25, 7:10 AM

User reports worse quality with MTP on Qwen 3.6 and Gemma 4

A Reddit user self-hosting Qwen 3.6 27B with Llama.cpp reports that enabling Multi-Token Prediction (MTP) degrades output quality in approximately 8 out of 10 test cases compared to non-MTP inference. The observation contrasts with typical expectations of MTP improvements.

Jun 25, 7:10 AM