AnalysisAI Models
7 hours ago
User post-trains LLM to reliably roll a die
Frontier LLMs like Claude and GPT always answer '4' when asked to roll a die. A Reddit user post-trained a model to explore and produce varied rolls, treating it as a toy problem for reinforcement learning.
·
7 hours ago
