ArcANE benchmark tests role-playing agents' character consistency

AnalysisAI ModelsAI Agents

6 days ago

ArcANE benchmark tests role-playing agents' character consistency

ArcANE introduces a new benchmark for role-playing language agents, using a dataset from fanfiction and novels to test character consistency across story chapters. The authors also provide an evaluation model that achieves 79% agreement with human judgments on the test set.

6 days ago