DeepSeek V4 tops coding benchmarks but trails frontier by 8 months

AnalysisAI Models

3 days ago

DeepSeek V4 tops coding benchmarks but trails frontier by 8 months

DeepSeek V4 scores 80.6 on SWE-bench Verified and 93.5 on LiveCodeBench, among the best. Yet CAISI rates it roughly eight months behind frontier models across a broad set of domains.

3 days ago