AnalysisAI Models
8 days ago
Qift quantization method targets W2A4/KV4 LLM inference
Proposes a shift-friendly no-zero two-bit weight level set for rotated W2A4/KV4 LLM inference, addressing collapse issues in standard W2 quantization. The method studies scalar level-set geometry to improve memory efficiency.
·
8 days ago