AnalysisDevelopers
14 days ago
llama.cpp build b9455 boosts speed on dual RTX 3090
User reports improved performance with llama.cpp build b9455 on 2x3090 GPUs, previously getting 30-50 tk/s. The new build appears to close the gap with vLLM for tensor-parallel inference.
·
14 days ago
