LaunchAI ModelsDevelopers
14 days ago
ITBench-AA benchmark finds frontier models below 50% on IT tasks
Created by Artificial Analysis and IBM Research, ITBench-AA tests SRE tasks. Frontier models like GPT-4o and Claude 4.5 scored under 50%, highlighting gaps in agentic enterprise AI.
