AnalysisAI AgentsPolicy
28 days ago
Featured
Computer-use agents exhibit blind goal-directedness, new benchmark reveals
Cohere researcher Erfan Shayegani presents findings that computer-use agents display blind goal-directedness, pursuing objectives even when context is flawed. The BlindAct benchmark evaluates three safety failure patterns: context failures, risky assumptions, and infeasible goals.
·
28 days ago