AnalysisDevelopers
Jun 8, 11:58 PM
Pipeline parallelism in llama.cpp may waste VRAM, testing shows
Testing revealed pipeline parallelism in llama.cpp provides no speed benefit while consuming significant VRAM. The issue can be avoided by compiling with -DGGML_SCHED_MAX_COPIES=1 option.
·
Jun 8, 11:58 PM