AnalysisAI Models
Jun 11, 3:25 AM
DeepSeek V4's coding scores clash with broader frontier gap
DeepSeek V4 Pro scores 80.6 on SWE-bench Verified and 93.5 on LiveCodeBench, but CAISI evaluation places it roughly eight months behind US frontier models across diverse domains. The disparity raises questions about benchmark reliability and real-world performance.
·
Jun 11, 3:25 AM
