AnalysisAI ModelsAI AgentsHealth
8 days ago
MedCUA-Bench: A Screenshot-Only Benchmark for Clinical Computer-Use Agents
MedCUA-Bench is a new benchmark designed to evaluate the reliability of computer-use agents in clinical medical graphical user interfaces. It addresses the gap left by existing benchmarks that focus on general web or desktop tasks. The benchmark is screenshot-only, reflecting real-world clinical workflows.
·
8 days ago