Back to AIBriefs
AnalysisDevelopers

User benchmarks dual-GPU LLM inference speed with llama.cpp split methods

A Reddit user compared inference speed using llama.cpp's row/tensor split vs ik_llama's graph split on two RTX 3090 GPUs. The post details setup and results from the benchmark.

·
2 days ago
User benchmarks dual-GPU LLM inference speed with llama.cpp split methods — AIBriefs