BeeLlama v0.3.1: 177.8 tps on RTX 3090 with optimizations

LaunchDevelopers

Jun 4, 9:25 PM

BeeLlama v0.3.1: 177.8 tps on RTX 3090 with optimizations

BeeLlama v0.3.1, a fork of llama.cpp, achieves up to 177.8 tokens per second on a single RTX 3090 for Qwen 3.6 27B and Gemma 4 31B, a 4.93x speedup over baseline. Adds DFlash, MTP, q6_0 cache, TurboQuant, and aligns with upstream llama.cpp.

BeeLlama v0.3.1 – latest llama.cpp with extras! DFlash, MTP, q6_0 cache, TurboQuant. Single RTX 3090: Qwen 3.6 27B & Gemma 4 31B up to 177.8 tps (4.93x over baseline)13 days agoAnbeeld

··Discuss

Jun 4, 9:25 PM