Process-reward optimization advances computer use agents

AnalysisAI Models

1 day ago

Process-reward optimization advances computer use agents

Two papers propose process-reward optimization for training computer use agents (CUAs), addressing limitations of sparse reward and costly live environment interaction. Methods like filtered behavior cloning and multi-granularity reward models improve agent performance on complex digital workflows.

PRO-CUA: Process-Reward Optimization for Computer Use Agents20 days agoYifei He, Rui Yang, Hao Bai, Tong Zhang, Han Zhao

1 day ago