DeepSeek V4's coding scores clash with broader frontier gap

AnalysisAI Models

Jun 11, 3:25 AM

DeepSeek V4's coding scores clash with broader frontier gap

DeepSeek V4 Pro scores 80.6 on SWE-bench Verified and 93.5 on LiveCodeBench, but CAISI evaluation places it roughly eight months behind US frontier models across diverse domains. The disparity raises questions about benchmark reliability and real-world performance.

Jun 11, 3:25 AM