AnalysisPolicy
6 days ago
Featured·
Podcast explores 'Reality as the Final Eval' for AI
Lukas Petersson and Axel Backlund of Andon Labs discuss the limitations of compressed AI benchmarks like SWE-Bench Pro and MMLU. They argue that real-world deployment is the ultimate evaluation for AI systems.
6 days ago
