SWE-rebench leaderboard updated with 110 new tasks

AnalysisAI Models

21 days ago

SWE-rebench leaderboard updated with 110 new tasks

The SWE-rebench leaderboard added 110 fresh Python tasks from GitHub PRs, covering results for GPT-5.5, Opus 4.7, Cursor Composer 2.5, and Kimi K2.6. Methodology changes include model updates and configuration adjustments for more complex evaluations.

··Discuss

21 days ago