Back to AIBriefs
AnalysisAI Models

On-policy distillation called lasting post-training method by Lambert

Nathan Lambert avatar
Nathan Lambert
@natolambert.bsky.social

On-policy distillation is on track to be a lasting method in post-training. The list of areas would be: Instruction tuning (SFT/IFT) RLHF Direct Preference Optimization (DPO et al) RLVR On-policy Distillation (OPD) New classes of methods are rare! Excited to play.

·
28 days ago
On-policy distillation called lasting post-training method by Lambert — AIBriefs