LaunchAI Models
12 days ago
Agent Arena launches causal evaluation leaderboard for real-world agents
The leaderboard uses causal tracing, treating agent components as treatments in a randomized controlled trial to estimate net improvement. It analyzes millions of in-the-wild interactions from users doing software engineering, financial analysis, and more.
12 days ago
