Back to AIBriefs
AnalysisAI Models

DeepSeek V4 Pro uses novel KV cache with sliding window attention

Together AI avatar
Together AI
@togethercompute

DeepSeek V4 Pro has a fundamentally different KV cache than any prior DeepSeek model. Sliding window attention, an indexer, and compression states all need to be stored correctly to get good cache reuse. To get it to run fast we didn't just rewrite the KV cache from scratch, we

·
2 hours ago