GPT-5.5 tops DeepSWE coding benchmark, beating Claude Opus 4.8

AnalysisAI ModelsDevelopers

13 days ago

Official updates for developers building with Codex & the OpenAI Platform • Service status: https://t.co/kZwnwdYYEq

OpenAI Developers

@OpenAIDevs

RT @reach_vb: GPT-5.5 is #1 on DeepSWE, a hard long-horizon coding benchmark 🔥 70% pass@1 vs 58% for Claude Opus 4.8. And GPT-5.5 gets th…

13 days ago