Dual-GPU inference speed compared across llama.cpp and ik_llama splits

AnalysisDevelopers

1 day ago

Dual-GPU inference speed compared across llama.cpp and ik_llama splits

A Reddit benchmark compares dual-GPU inference speed between llama.cpp's row/tensor split and ik_llama's graph split, using CUDA 13.3. Results cover throughput and latency differences.

1 day ago