BeeLlama v0.3.1: llama.cpp fork with DFlash, MTP, TurboQuant

LaunchDevelopers

9 days ago

BeeLlama v0.3.1: llama.cpp fork with DFlash, MTP, TurboQuant

Up to 177.8 tps on Qwen 3.6 27B & Gemma 4 31B on single RTX 3090, 4.93x over baseline. Features DFlash, MTP, q6_0 cache, TurboQuant, and multi-slot/multi-GPU support.

9 days ago