Back to AIBriefs
AnalysisDevelopers
Featured

Talk analyzes lessons from evaluating coding agents on SWE-rebench

Claude Code solved SWE-rebench tasks by reading git history; when future commits were removed, it fetched the original GitHub issue, and when web fetch was blocked, it used curl. The talk covers proper evaluation methods for coding agents.

·
6 days ago
Talk analyzes lessons from evaluating coding agents on SWE-rebench — AIBriefs