AnalysisAI Models
27 days ago
RTX 5080 runs Qwen3.6 35B MoE at 56 tok/s with 128k context
Qwen3.6 35B MoE achieves 56 tok/s on RTX 5080 16GB at 128k context using Q4_K_XL quantization. Multi-Token Prediction (MTP) in llama.cpp offered no speedup in this configuration.
·
27 days ago
