Maybe KV cache offload to RAM isn't bad

AnalysisDevelopers

8 days ago

Maybe KV cache offload to RAM isn't bad

Reddit user bobaburger shares experience with llama.cpp's `-nkvo` flag, claiming offloading KV cache to RAM is acceptable. Runs Qwen3.6 27B (IQ4_XS) and finds performance impact manageable for their hardware.

8 days ago