Gemma4 26b A4B apex quant achieves 38 tps on local hardware

AnalysisAI Models

25 days ago

Gemma4 26b A4B apex quant achieves 38 tps on local hardware

A user reports 38 tokens per second at 90,000 context using an Apex quantized Gemma4 26B A4B model on an RX 9060 XT 16 GB with llama.cpp Vulkan. The quantized model uses a 15 GB GGUF file and shows no quality degradation compared to a previous quantization.

25 days ago