ITBench-AA benchmark finds frontier models below 50% on IT tasks

LaunchAI ModelsDevelopers

14 days ago

ITBench-AA benchmark finds frontier models below 50% on IT tasks

Created by Artificial Analysis and IBM Research, ITBench-AA tests SRE tasks. Frontier models like GPT-4o and Claude 4.5 scored under 50%, highlighting gaps in agentic enterprise AI.

Artificial Analysis and IBM Research are launching ITBench-AA, the first in a new series of...14 days agoArtificial Analysis

14 days ago