AnalysisDevelopers
Jun 22, 5:18 PM
llama.cpp PR optimizes Top-N-Sigma sampler by removing softmax+sort
The PR removes an unconditional softmax+sort from the Top-N-Sigma sampler, which improves performance when followed by Dist sampler. On an M3 Max MacBook Pro, the change increases tokens per second.
