User runs Qwen3.6-27b at 55 tok/s on 4x RTX 5060 Ti for $1800

AnalysisAI Models

10 hours ago

User runs Qwen3.6-27b at 55 tok/s on 4x RTX 5060 Ti for $1800

$1800 in GPUs (four RTX 5060 Ti 16GB) runs Qwen3.6-27b FP8 at 55 tok/s with 262K context using BF16 KV cache. The setup uses P2P communication between cards. Achieves inference for a single user.

This is amazing. Token speed doubled + kv cache now need low vram - qwen 27b4 days ago9r4n4y Discuss

10 hours ago