Back to AIBriefs
AnalysisAI Models

Opus 4.8 Thinking deteriorates on LMArena Hard Prompts

Opus 4.8 Thinking scored 23 points lower than Opus 4.6 Thinking on LMArena's Hard Prompts English leaderboard, while Opus 4.7 declined 15 points. User observations point to consistent performance regression across recent versions.

·
11 days ago
Opus 4.8 Thinking deteriorates on LMArena Hard Prompts — AIBriefs