DeepSWE benchmark tests frontier models' coding abilities

AnalysisAI Models

17 hours ago

DeepSWE benchmark tests frontier models' coding abilities

DeepSWE is a new coding benchmark that tests frontier models with contamination-free tasks written from scratch. It spans 91 repositories across 5 languages for diverse evaluation. The benchmark aims to provide a more reliable measure of models' coding ability.

17 hours ago