AnalysisDevelopers
6 days ago
Featured
Talk analyzes lessons from evaluating coding agents on SWE-rebench
Claude Code solved SWE-rebench tasks by reading git history; when future commits were removed, it fetched the original GitHub issue, and when web fetch was blocked, it used curl. The talk covers proper evaluation methods for coding agents.
·
6 days ago