Back to AIBriefs
AnalysisAI Models

ARBOR: Online process rewards improve LLM search agents

ARBOR introduces a reusable rubric buffer to provide online process-level rewards for LLM-based search agents, addressing the degeneration of outcome-only reward on outcome-homogeneous groups. The method enables finer-grained supervision during the search process.

·
8 days ago
ARBOR: Online process rewards improve LLM search agents — AIBriefs