AnalysisAI Models
4 hours ago
Video covers fixes to on-policy distillation and reward model derivations
Nathan Lambert
@natolambert.bsky.socialA LLN - large language Nathan - (RL, RLHF, society, robotics), athlete, yogi, chef Writes http://interconnects.ai Prev Ai2/Olmo, HuggingFace, Berkeley, and normal places
Nathan Lambert
@natolambert.bsky.social
I'm doing Q&A videos as I roll through my course. Here's the next one, covering subtle fixes to the on-policy distillation and reward model derivations, common notation traps when doing this math, and more added resources to go deeper (e.g. John Schulman's kl estimation blog) youtu.be/gB-bYUECpzE
·
4 hours ago