LLM judges biased toward own family, Mistral penalizes its own

AnalysisAI Models

14 hours ago

LLM judges biased toward own family, Mistral penalizes its own

In a blind-grading study of 55 LLMs with 22k judgments, models favor their own family—Qwen favors Qwen by ~0.9 points. Mistral uniquely penalizes its own models by ~1.0 point, reversing the pattern.

14 hours ago