AnalysisDevelopers
2 days ago
User benchmarks dual-GPU LLM inference speed with llama.cpp split methods
A Reddit user compared inference speed using llama.cpp's row/tensor split vs ik_llama's graph split on two RTX 3090 GPUs. The post details setup and results from the benchmark.
·
2 days ago
