AnalysisAI ModelsAI Agents
6 days ago
ArcANE benchmark tests role-playing agents' character consistency
ArcANE introduces a new benchmark for role-playing language agents, using a dataset from fanfiction and novels to test character consistency across story chapters. The authors also provide an evaluation model that achieves 79% agreement with human judgments on the test set.
·
6 days ago