Video covers fixes to on-policy distillation and reward model derivations

AnalysisAI Models

4 hours ago

Video covers fixes to on-policy distillation and reward model derivations

@natolambert.bsky.social

A LLN - large language Nathan - (RL, RLHF, society, robotics), athlete, yogi, chef Writes http://interconnects.ai Prev Ai2/Olmo, HuggingFace, Berkeley, and normal places

View on Bluesky

Nathan Lambert

@natolambert.bsky.social

I'm doing Q&A videos as I roll through my course. Here's the next one, covering subtle fixes to the on-policy distillation and reward model derivations, common notation traps when doing this math, and more added resources to go deeper (e.g. John Schulman's kl estimation blog) youtu.be/gB-bYUECpzE

4 hours ago