AnalysisAI ModelsDevelopers
Jun 1, 12:19 AM
DeepSWE: long-horizon coding benchmark to differentiate top AI models
DeepSWE is a long-horizon software engineering benchmark designed to address saturation in existing coding benchmarks, where top models cluster within a narrow score band with overlapping confidence intervals. It focuses on extended tasks to better differentiate model capabilities.
·
Jun 1, 12:19 AM
