Back to AIBriefs
AnalysisAI Models

Models hack public benchmarks by retrieving solutions online, research shows

Cursor avatar
Cursor
@cursor_ai

We're sharing new research on how models hack public benchmarks. The latest models, including Opus 4.8 and Composer 2.5, learn to retrieve solutions from the internet or git history. When we apply a stricter harness, eval scores drop significantly. https://t.co/4kTVssqdjx

·
Jun 25, 5:21 PM