LaunchDevelopers
9 days ago
Huawei open-sources KVarN KV-cache quantization for vLLM
KVarN claims 3–5× KV cache compression with actual speed-up instead of slow-down, unlike TurboQuant. Benchmarks show KVarN 6-bit matches q8_0 and 4-bit matches q5_0 precision. Licensed under Apache 2.0, integrates into vLLM with a single flag.
KV cache quant benchmarks: q5 & q6 are underrated, q8/q4 is bad, TCQ has a niche17 days agoAnbeeldDiscuss
KVarN: new KV-cache quant from Huawei. 3–5× KV cache compression with actual speed-up instead of slow-down, and unlike TurboQuant it holds up on reasoning (Apache 2.0, vLLM single flag)9 days agoacluk90Discuss
KV cache quant benchmarks: KVarN 6-bit matches q8_0, 4-bit matches q5_0. Massive!7 days agoAnbeeldDiscuss
9 days ago
