Back to AIBriefs
AnalysisDevelopers

llama.cpp build b9455 boosts speed on dual RTX 3090

User reports improved performance with llama.cpp build b9455 on 2x3090 GPUs, previously getting 30-50 tk/s. The new build appears to close the gap with vLLM for tensor-parallel inference.

·
14 days ago
llama.cpp build b9455 boosts speed on dual RTX 3090 — AIBriefs