Back to AIBriefs
AnalysisAI Models

AI eval papers using LLM-as-a-judge criticized for relying on deprecated models

Abeba Birhane avatar
Abeba Birhane
@abeba.blacksky.app

reading AI eval papers using llm-as-a-judge is alarming for many reasons; many “scientific claims” rely on evals executed on models that are now deprecated. how are we supposed to treat “findings” that are inherently not reproducible? i despair. llms are accelerating the decay of scientific rigor

·
Jun 24, 12:45 PM