LaunchDevelopers
Jun 4, 9:25 PM
BeeLlama v0.3.1: 177.8 tps on RTX 3090 with optimizations
BeeLlama v0.3.1, a fork of llama.cpp, achieves up to 177.8 tokens per second on a single RTX 3090 for Qwen 3.6 27B and Gemma 4 31B, a 4.93x speedup over baseline. Adds DFlash, MTP, q6_0 cache, TurboQuant, and aligns with upstream llama.cpp.
