AnalysisAI Models
14 hours ago
Community maps KV cache quantization for Gemma 4 and Qwen 3.6
Reddit users benchmark KLD of KV cache quantization on Gemma 4 and Qwen 3.6 models, finding q8/q8 nearly free on both, while q4/q4 is usable on Qwen but catastrophic on Gemma. Results show QAT variants of Gemma 4 respond significantly better to KV cache quantization.
