LaunchDevelopers
23 days ago
llama.cpp adds fast Walsh-Hadamard transform for CUDA
New CUDA implementation of fast Walsh-Hadamard transform improves llama.cpp performance by 1-2% on prompt processing and 7-9% on token generation. The optimization is used for quantizing the KV cache.
