RoPE-Aware Bit Allocation for KV-Cache Quantization

AnalysisAI Models

23 hours ago

RoPE-Aware Bit Allocation for KV-Cache Quantization

New paper proposes a quantization method for KV-cache that exploits RoPE's 2D frequency block structure. It assigns bits to components based on their contribution to attention, improving accuracy at low bit-widths.

23 hours ago