Back to AIBriefs
AnalysisAI Models

Video covers fixes to on-policy distillation and reward model derivations

Nathan Lambert avatar
Nathan Lambert
@natolambert.bsky.social

I'm doing Q&A videos as I roll through my course. Here's the next one, covering subtle fixes to the on-policy distillation and reward model derivations, common notation traps when doing this math, and more added resources to go deeper (e.g. John Schulman's kl estimation blog) youtu.be/gB-bYUECpzE

·
4 hours ago