Gemma 4 QAT shows better response to KV cache quantization

AnalysisAI Models

Jun 21, 8:48 AM

Gemma 4 QAT shows better response to KV cache quantization

KL Divergence results on wikitext with 16k context show Gemma 4 QAT models at Q8_0 KV cache quantization have low divergence from full precision. This suggests QAT training mitigates the model's previously reported sensitivity to KV cache quantization.

Jun 21, 8:48 AM