Back to AIBriefs
AnalysisAI Models

RoPE-Aware Bit Allocation for KV-Cache Quantization

New paper proposes a quantization method for KV-cache that exploits RoPE's 2D frequency block structure. It assigns bits to components based on their contribution to attention, improving accuracy at low bit-widths.

·
23 hours ago
RoPE-Aware Bit Allocation for KV-Cache Quantization — AIBriefs