Back to AIBriefs
AnalysisDevelopers

llama.cpp PR optimizes Top-N-Sigma sampler by removing softmax+sort

The PR removes an unconditional softmax+sort from the Top-N-Sigma sampler, which improves performance when followed by Dist sampler. On an M3 Max MacBook Pro, the change increases tokens per second.

··Discuss
Jun 22, 5:18 PM
llama.cpp PR optimizes Top-N-Sigma sampler by removing softmax+sort — AIBriefs