AnalysisDevelopers
7 days ago
Featured
Cursor explains why offline RL comes before online RL
Federico Cassano: online RL only works if the model is already great; offline RL bakes in reasoning and tool calling first, then online RL adds the final polish. The video details Cursor's two-stage RL strategy.
·
7 days ago