AnalysisDevelopers
27 days ago
KV cache quantization benchmarks compare TurboQuant, q5, and q8
Benchmarks on a single RTX 3090 compare TurboQuant, TCQ, q5, and symmetric q8 for KV cache quantization. Results suggest q5 deserves more attention while symmetric q8 may waste VRAM. TurboQuant is overrated but saved by TCQ.
·
27 days ago
