User maps KLD of KV cache quantization for Qwen3.6-35B-A3B and Gemma4-E2B

AnalysisAI Models

Jun 23, 3:12 PM

User maps KLD of KV cache quantization for Qwen3.6-35B-A3B and Gemma4-E2B

The analysis compares quantization methods (q8, q4, turbo4, turbo3, turbo2) on Qwen3.6-35B-A3B and Gemma4-E2B. q8 is nearly free on both models, while q4 is catastrophic on Gemma but usable on Qwen.

Jun 23, 3:12 PM