LaunchDevelopers
27 days ago
llama.cpp moves MTP sampling to backend
PR #23287 moves MTP draft path sampling to backend for improved performance. The change optimizes multi-token prediction in speculative decoding.
PR #23287 moves MTP draft path sampling to backend for improved performance. The change optimizes multi-token prediction in speculative decoding.