RTX 5080 runs Qwen3.6 35B MoE at 56 tok/s with 128k context

AnalysisAI Models

27 days ago

RTX 5080 runs Qwen3.6 35B MoE at 56 tok/s with 128k context

Qwen3.6 35B MoE achieves 56 tok/s on RTX 5080 16GB at 128k context using Q4_K_XL quantization. Multi-Token Prediction (MTP) in llama.cpp offered no speedup in this configuration.

27 days ago