AnalysisAI Models
6 days ago
Featured
Benchmarking agents: ARC AGI 3 and the measurement gap
ARC AGI 3 launched with every task human-solvable but frontier models under 1%. Vincent Chen argues AI measurement has fallen behind AI building, and benchmarks must bet on future capabilities.
·
6 days ago