AnalysisAI Models
Jun 27, 8:16 PM
New benchmark ObviousBench shows regression from Opus 4.6 to 4.7
ObviousBench evaluates models on avoiding visible errors across 8 categories, using pass^3 reliability. Its creator reports that Claude Opus 4.7 regressed compared to Opus 4.6 on this benchmark.