CollabBench benchmark measures LLM collaboration with diverse players

AnalysisAI Models

6 days ago

CollabBench benchmark measures LLM collaboration with diverse players

CollabBench is a new benchmark evaluating LLM agents' collaborative ability through grounded interactions with simulated human partners. It includes diverse player types and requires proactive engagement beyond simple conversational collaboration.

6 days ago