Back to AIBriefs
LaunchDevelopers

Fork of llama.cpp adds --numa mirror mode for multi-socket CPU inference

A fork of llama.cpp introduces a --numa mirror mode that duplicates model weights across NUMA nodes to maximize memory bandwidth on multi-socket CPU systems. The feature is available on GitHub; the author invites testers to evaluate performance gains.

Fork of llama.cpp adds --numa mirror mode for multi-socket CPU inference — AIBriefs