Back to AIBriefs
AnalysisAI Models

DeepSeek V4's coding scores clash with broader frontier gap

DeepSeek V4 Pro scores 80.6 on SWE-bench Verified and 93.5 on LiveCodeBench, but CAISI evaluation places it roughly eight months behind US frontier models across diverse domains. The disparity raises questions about benchmark reliability and real-world performance.

·
Jun 11, 3:25 AM
DeepSeek V4's coding scores clash with broader frontier gap — AIBriefs