LaunchDevelopersAI Agents
9 days ago
Arena launches Agent Arena for real-world agent evaluation
Agent Arena uses millions of live user sessions with real tasks to evaluate agent performance, moving beyond static benchmarks. It focuses on causal evaluation of agents performing tool use and long-horizon tasks.
·
9 days ago
