OpenMythos benchmarks published, showing SWE-bench gap vs Qwen 3.6 27B

AnalysisAI Models

Jun 23, 6:56 PM

OpenMythos benchmarks published, showing SWE-bench gap vs Qwen 3.6 27B

Benchmarks for the OpenMythos model are released, revealing a discrepancy in SWE-bench performance compared to Qwen 3.6 27B's official numbers. The Qwen team used a different eval harness and filtered benchmark problems, which likely accounts for the difference.

Jun 23, 6:56 PM