AnalysisAI Models
22 hours ago
New papers advance on-policy distillation for LLMs
Five arxiv papers propose methods to improve on-policy distillation: SEAD uses entropy-guided supervision, Self-Distilled Policy Gradient applies self-distillation, and LARK selects trajectories via learnability. Others address distribution alignment and supervision fidelity decay.
·
22 hours ago