Back to AIBriefs
AnalysisAI Models

New benchmark ObviousBench shows regression from Opus 4.6 to 4.7

ObviousBench evaluates models on avoiding visible errors across 8 categories, using pass^3 reliability. Its creator reports that Claude Opus 4.7 regressed compared to Opus 4.6 on this benchmark.

··Discuss
Jun 27, 8:16 PM
New benchmark ObviousBench shows regression from Opus 4.6 to 4.7 — AIBriefs