AnalysisAI Models
23 hours ago
RoPE-Aware Bit Allocation for KV-Cache Quantization
New paper proposes a quantization method for KV-cache that exploits RoPE's 2D frequency block structure. It assigns bits to components based on their contribution to attention, improving accuracy at low bit-widths.
·
23 hours ago