LaunchAI Models
16 hours ago
Evalatro: open benchmark where LLMs play Balatro
Evalatro is an open benchmark that tests LLMs by having them play the real Balatro card game in real time. It started as a personal project to get LLM advice on levels and evolved into a full evaluation suite.
·
16 hours ago
