Huawei open-sources KVarN KV-cache quantization for vLLM

LaunchDevelopers

9 days ago

Huawei open-sources KVarN KV-cache quantization for vLLM

KVarN claims 3–5× KV cache compression with actual speed-up instead of slow-down, unlike TurboQuant. Benchmarks show KVarN 6-bit matches q8_0 and 4-bit matches q5_0 precision. Licensed under Apache 2.0, integrates into vLLM with a single flag.

KV cache quant benchmarks: q5 & q6 are underrated, q8/q4 is bad, TCQ has a niche17 days agoAnbeeld Discuss

KVarN: new KV-cache quant from Huawei. 3–5× KV cache compression with actual speed-up instead of slow-down, and unlike TurboQuant it holds up on reasoning (Apache 2.0, vLLM single flag)9 days agoacluk90 Discuss

KV cache quant benchmarks: KVarN 6-bit matches q8_0, 4-bit matches q5_0. Massive!7 days agoAnbeeld Discuss

Qwen 3.6 27B KV cache quant benchmarks: 75 pairs, q8/q6/q5/q4, KVarN, Turbo/TCQ6 days agoAnbeeld Discuss

I implemented KVarN in my llama.cpp fork and ran KLD benchmarks. It's promising!8 days agoAnbeeld Discuss

··Discuss

9 days ago