AnalysisAI Models
6 days ago
CollabBench benchmark measures LLM collaboration with diverse players
CollabBench is a new benchmark evaluating LLM agents' collaborative ability through grounded interactions with simulated human partners. It includes diverse player types and requires proactive engagement beyond simple conversational collaboration.
·
6 days ago