AnalysisDevelopers
30 days ago
Llama.cpp dual GPU tensor parallelism fix
Fix addresses long-standing issue where `--split-mode tensor` did not support quantized KV caches in llama.cpp. Community contributor's patch enables full tensor parallelism with quantized KV, improving dual GPU performance.
·
30 days ago
