AnalysisDevelopersJuly 2, 2026

MosaicKV: Dynamic KV cache compression for long-context LLM serving

MosaicKV uses a two-dimensional compression strategy to reduce KV cache memory for prompts with hundreds of thousands to millions of tokens. It addresses GPU memory exhaustion while maintaining serving efficiency.

15 sources

SHIFT: Gate-Modulated Activation Steering for Knowledge Conflict Mitigation in Retrieval-Augmented Generationarxiv.org

Back to the feed