Back to AIBriefs
AnalysisAI Models

New papers advance on-policy distillation for LLMs

Five arxiv papers propose methods to improve on-policy distillation: SEAD uses entropy-guided supervision, Self-Distilled Policy Gradient applies self-distillation, and LARK selects trajectories via learnability. Others address distribution alignment and supervision fidelity decay.

New papers advance on-policy distillation for LLMs — AIBriefs