Google quantization broken, unsloth Q4_K_XL recommended

Analysis

Jun 8, 10:02 PM

Google quantization broken, unsloth Q4_K_XL recommended

Google's QAT quantization incorrectly quantizes token embeddings to q6k instead of the intended precision. Users are recommended to use unsloth's UD Q4_K_XL with --pure for now. The bug is partially addressed by the --pure flag but other issues remain.

Jun 8, 10:02 PM