AnalysisAI Models
2 hours ago
DeepSeek V4 Pro uses novel KV cache with sliding window attention

Together AI
@togethercomputeAccelerate inference, model shaping, and pre-training on a research-optimized platform.
San Francisco, CAtogether.ai

Together AI
@togethercompute
DeepSeek V4 Pro has a fundamentally different KV cache than any prior DeepSeek model. Sliding window attention, an indexer, and compression states all need to be stored correctly to get good cache reuse. To get it to run fast we didn't just rewrite the KV cache from scratch, we

·
2 hours ago