AnalysisAI Models
9 days ago
New papers advance on-policy distillation for LLM reasoning
Multiple Arxiv papers propose methods like near-future guidance, dynamic token selection, and trajectory selection to improve LLM reasoning via on-policy knowledge distillation. Techniques address supervision fidelity decay and cross-tokenizer distillation, showing growing research focus.
·
9 days ago