Back to AIBriefs
LaunchDevelopers

BeeLlama v0.3.1: llama.cpp fork with DFlash, MTP, TurboQuant

Up to 177.8 tps on Qwen 3.6 27B & Gemma 4 31B on single RTX 3090, 4.93x over baseline. Features DFlash, MTP, q6_0 cache, TurboQuant, and multi-slot/multi-GPU support.

·
9 days ago
BeeLlama v0.3.1: llama.cpp fork with DFlash, MTP, TurboQuant — AIBriefs