llama.cpp build b9455 boosts speed on dual RTX 3090

AnalysisDevelopers

14 days ago

llama.cpp build b9455 boosts speed on dual RTX 3090

User reports improved performance with llama.cpp build b9455 on 2x3090 GPUs, previously getting 30-50 tk/s. The new build appears to close the gap with vLLM for tensor-parallel inference.

14 days ago