Back to AIBriefs
AnalysisDevelopers

Dual-GPU inference speed compared across llama.cpp and ik_llama splits

A Reddit benchmark compares dual-GPU inference speed between llama.cpp's row/tensor split and ik_llama's graph split, using CUDA 13.3. Results cover throughput and latency differences.

·
1 day ago
Dual-GPU inference speed compared across llama.cpp and ik_llama splits — AIBriefs