How-ToDevelopers
Jun 21, 11:01 PM
Local LLM inference optimization: the complete guide
A practical guide consolidating a year of local LLM experiments into llama.cpp optimization, covering VRAM fitting, KV cache tuning, MoE placement, MTP speculative decoding, and CPU tuning. Includes notes on enabling XMP and pinning to P-cores for performance.
