AnalysisAI Models
8 hours ago
ObviousBench reveals regression from Opus 4.6 to 4.7
Reddit user pawofdoom created ObviousBench, a benchmark for stupid mistakes, and found Claude Opus 4.6 outperforms 4.7. The benchmark tests easy questions like spelling "Google" that top models still fail.