AnalysisAI Models
Jun 21, 8:48 AM
Gemma 4 QAT shows better response to KV cache quantization
KL Divergence results on wikitext with 16k context show Gemma 4 QAT models at Q8_0 KV cache quantization have low divergence from full precision. This suggests QAT training mitigates the model's previously reported sensitivity to KV cache quantization.
·
Jun 21, 8:48 AM
