Fork of llama.cpp adds --numa mirror mode for multi-socket CPU inference

LaunchDevelopers

19 hours ago

Fork of llama.cpp adds --numa mirror mode for multi-socket CPU inference

A fork of llama.cpp introduces a --numa mirror mode that duplicates model weights across NUMA nodes to maximize memory bandwidth on multi-socket CPU systems. The feature is available on GitHub; the author invites testers to evaluate performance gains.

Comparing dual-GPU inference speed between llama.cpp row/tensor split and ik_llama graph split9 days agogrumd Discuss

19 hours ago