Back to AIBriefs
AnalysisAI Models

METR: Over half of SWEBench results are unmergeable slop

Swyx avatar
Swyx
@swyx

It's finally out!!! @METR_Evals found that more than half of SWEBench results is unmergeable slop. FrontierCode represents over 1000+ hours of maintainer validated software engineering work most frontier models cannot yet solve, much less solve with high quality. Cog had IOI

·
9 days ago
METR: Over half of SWEBench results are unmergeable slop — AIBriefs