AnalysisAI Models
7 days ago
Trajectory-Aware RL for Diffusion Language Models
Proposes trajectory-aware reinforcement learning method for diffusion language models, using the denoising trace (confidence dynamics of tokens) to guide policy updates. The approach leverages the iterative unmasking process to improve generation quality beyond standard policy gradient methods.
·
7 days ago