AnalysisAI Models
Jun 23, 3:12 PM
User maps KLD of KV cache quantization for Qwen3.6-35B-A3B and Gemma4-E2B
The analysis compares quantization methods (q8, q4, turbo4, turbo3, turbo2) on Qwen3.6-35B-A3B and Gemma4-E2B. q8 is nearly free on both models, while q4 is catastrophic on Gemma but usable on Qwen.
·
Jun 23, 3:12 PM
