User benchmarks dual-GPU LLM inference speed with llama.cpp split methods

AnalysisDevelopers

2 days ago

User benchmarks dual-GPU LLM inference speed with llama.cpp split methods

A Reddit user compared inference speed using llama.cpp's row/tensor split vs ik_llama's graph split on two RTX 3090 GPUs. The post details setup and results from the benchmark.

2 days ago