AnalysisDevelopers
11 days ago
llama.cpp build b9455 improves performance on 2x3090
User reports 30-50 tk/s on Qwen3.6-27B-UD-Q8_K_XL GGUF with build b9455, surpassing vllm's tensor parallel performance. The build shows significant speedup over previous llama.cpp versions.
