LaunchDevelopers
6 days ago
Cog ships enterprise evals with up to 100-hour benchmarks

swyx
@swyxachieve ambition with intentionality, intensity, integrity & insanity. affiliations: - @dxtipshq - @cognition - @temporalio - @aidotengineer - @latentspacepod
san francisco / singaporeswyx.io

Swyx
@swyx
Finally! the first eval ship from cog!!!!!!!!!! πΌπΌ To contextualize: @METR_Evals cap out at ~16 hours. Cog has private enterprise evals up to 100hrs, and is confident enough to put a financial guarantee on it π€― METR dataset: ML eng, GPU kernels, cybersecurity > "METR (2026)

Β·
6 days ago