AnalysisAI Models
Jun 24, 2:03 AM
DeepSWE benchmark evaluates frontier models on real code tasks
DeepSWE covers 91 repositories across 5 languages and is contamination-free, with tasks written from scratch. It provides a more realistic assessment of coding capabilities.
·
Jun 24, 2:03 AM
