llama.cpp PR optimizes Top-N-Sigma sampler by removing softmax+sort

AnalysisDevelopers

Jun 22, 5:18 PM

llama.cpp PR optimizes Top-N-Sigma sampler by removing softmax+sort

The PR removes an unconditional softmax+sort from the Top-N-Sigma sampler, which improves performance when followed by Dist sampler. On an M3 Max MacBook Pro, the change increases tokens per second.

··Discuss

Jun 22, 5:18 PM