Qwen3.6-27B doubles token speed, cuts KV cache VRAM on RTX 3090

AnalysisAI Models

Jun 15, 9:11 AM

Qwen3.6-27B doubles token speed, cuts KV cache VRAM on RTX 3090

On a single RTX 3090, Qwen3.6-27B Q4_K_M achieves 38.6 tok/s with 72 MiB resident KV cache at native 256K context. Needle recall remains 88-100% at 6% residency, and accuracy holds 36/36 vs full cache.

Jun 15, 9:11 AM