AnalysisAI Models
1 day ago
Process-reward optimization advances computer use agents
Two papers propose process-reward optimization for training computer use agents (CUAs), addressing limitations of sparse reward and costly live environment interaction. Methods like filtered behavior cloning and multi-granularity reward models improve agent performance on complex digital workflows.