Back to AIBriefs
LaunchDevelopers

Cog ships enterprise evals with up to 100-hour benchmarks

Swyx avatar
Swyx
@swyx

Finally! the first eval ship from cog!!!!!!!!!! πŸ‘ΌπŸΌ To contextualize: @METR_Evals cap out at ~16 hours. Cog has private enterprise evals up to 100hrs, and is confident enough to put a financial guarantee on it 🀯 METR dataset: ML eng, GPU kernels, cybersecurity > "METR (2026)

Β·
6 days ago
Cog ships enterprise evals with up to 100-hour benchmarks β€” AIBriefs