llama.cpp tip: free more GPU memory with --no-mmap and --mlock

How-ToDevelopers

Jun 17, 6:23 PM

llama.cpp tip: free more GPU memory with --no-mmap and --mlock

Recent llama.cpp updates fix memory leaks; users recommend --n-gpu-layers 99 --no-mmap --mlock flags to keep everything on GPU and avoid system RAM. Works well for eGPU setups like 3090.

Jun 17, 6:23 PM