AnalysisDevelopers
1 day ago
Dual-GPU inference speed compared across llama.cpp and ik_llama splits
A Reddit benchmark compares dual-GPU inference speed between llama.cpp's row/tensor split and ik_llama's graph split, using CUDA 13.3. Results cover throughput and latency differences.
·
1 day ago
