AnalysisAI Models
21 hours ago
LLM judge panels suffer correlated errors, effective votes only two
Apple research shows that LLM-as-a-judge panels with nine models have only about two effective independent votes due to correlated errors. The findings call into question the reliability of using multiple LLMs for evaluation.
21 hours ago
