AnalysisAI Models
2 hours ago
Chollet: static benchmarks measure memorization, not intelligence

François Chollet
@fcholletCo-founder @ndea. Co-founder @arcprize. Creator of Keras and ARC-AGI. Author of 'Deep Learning with Python'.
United Statesfchollet.com

François Chollet
@fchollet
If your benchmark relies on a static dataset or sampling from a static distribution densely known at training time, then it is fundamentally measuring memorization/retrieval. Which might be fine if you're looking for a retrieval benchmark! But don't confuse it with intelligence.
·
2 hours ago