llama.cpp build b9455 improves performance on 2x3090

AnalysisDevelopers

11 days ago

llama.cpp build b9455 improves performance on 2x3090

User reports 30-50 tk/s on Qwen3.6-27B-UD-Q8_K_XL GGUF with build b9455, surpassing vllm's tensor parallel performance. The build shows significant speedup over previous llama.cpp versions.

RTX 5080 and RTX 3090 Setup: 80 Tok/s on Qwen 3.6 27B Q818 hours agoiMil Discuss

11 days ago