ObviousBench reveals regression from Opus 4.6 to 4.7

AnalysisAI Models

8 hours ago

ObviousBench reveals regression from Opus 4.6 to 4.7

Reddit user pawofdoom created ObviousBench, a benchmark for stupid mistakes, and found Claude Opus 4.6 outperforms 4.7. The benchmark tests easy questions like spelling "Google" that top models still fail.

··Discuss

8 hours ago