AnalysisAI Models
Jun 15, 9:11 AM
Qwen3.6-27B doubles token speed, cuts KV cache VRAM on RTX 3090
On a single RTX 3090, Qwen3.6-27B Q4_K_M achieves 38.6 tok/s with 72 MiB resident KV cache at native 256K context. Needle recall remains 88-100% at 6% residency, and accuracy holds 36/36 vs full cache.
·
Jun 15, 9:11 AM
