AnalysisAI Models
17 hours ago
DeepSWE benchmark tests frontier models' coding abilities
DeepSWE is a new coding benchmark that tests frontier models with contamination-free tasks written from scratch. It spans 91 repositories across 5 languages for diverse evaluation. The benchmark aims to provide a more reliable measure of models' coding ability.
·
17 hours ago
