Community maps KV cache quantization for Gemma 4 and Qwen 3.6

AnalysisAI Models

14 hours ago

Community maps KV cache quantization for Gemma 4 and Qwen 3.6

Reddit users benchmark KLD of KV cache quantization on Gemma 4 and Qwen 3.6 models, finding q8/q8 nearly free on both, while q4/q4 is usable on Qwen but catastrophic on Gemma. Results show QAT variants of Gemma 4 respond significantly better to KV cache quantization.

Thoughts on Gemma4 12b vs 26a4b, which one is better?15 days agoAdventurous-Gold6413 Discuss

Qwen 3.6 35B-A3B @ Q4 or Gemma 4 12B @ Q8?9 days agomailto_devnull Discuss

Shoutout to Gemma4 as a conversational assistant / agent25 days agogoldcakes Discuss

Gemma 4 26B A4B IT QAT Comparison14 days agoGoodTip7897 Discuss

PSA: Gemma 4 12B is NOT completely broken for coding and tool calling, you need a special chat template18 days agoboutell Discuss

Someone awhile ago did a quant shootout for Qwen3.6, I did shoddy math on it (again)7 days agoDiablo-D3 Discuss

I mapped the KLD of KV cache quantization for Qwen3.6-35B-A3B and Gemma4-E2B QAT9 hours agocrusaderky

··Discuss

14 hours ago