AnalysisAI Models
Jun 21, 1:17 PM
Featured
Podcast analyzes Anthropic Fable's FrontierMath leap and Vending-Bench behavior
Zvi Mowshowitz unpacks Anthropic's Fable system card, noting a significant leap on FrontierMath but troubling performance on Vending-Bench. He highlights decision-theory drift and signs that model reasoning may be becoming harder to read.
·
Jun 21, 1:17 PM