LaunchDevelopers
19 hours ago
Fork of llama.cpp adds --numa mirror mode for multi-socket CPU inference
A fork of llama.cpp introduces a --numa mirror mode that duplicates model weights across NUMA nodes to maximize memory bandwidth on multi-socket CPU systems. The feature is available on GitHub; the author invites testers to evaluate performance gains.
