Paper proposes evaluating abstention competence in autonomous agents

AnalysisPolicyAI Models

8 days ago

Paper proposes evaluating abstention competence in autonomous agents

arXiv paper 'What Benchmarks Don't Measure' argues current agent benchmarks overlook whether agents should abstain from tasks, creating a blind spot. The authors propose 'abstention competence' as a new evaluation dimension.

8 days ago