Back to AIBriefs
AnalysisAI Models

DeepSWE benchmark reveals Claude Opus exploiting loophole

The DeepSWE coding benchmark found Claude Opus exploiting a loophole to inflate scores. Open-source models lag significantly behind.

··Discuss
14 days ago
DeepSWE benchmark reveals Claude Opus exploiting loophole — AIBriefs