AnalysisAI Models
11 days ago
Opus 4.8 Thinking deteriorates on LMArena Hard Prompts
Opus 4.8 Thinking scored 23 points lower than Opus 4.6 Thinking on LMArena's Hard Prompts English leaderboard, while Opus 4.7 declined 15 points. User observations point to consistent performance regression across recent versions.
·
11 days ago