AI eval papers using LLM-as-a-judge criticized for relying on deprecated models

AnalysisAI Models

Jun 24, 12:45 PM

AI eval papers using LLM-as-a-judge criticized for relying on deprecated models

@abeba.blacksky.app

Founder & PI @aial.ie. Assistant Professor of AI, School of Computer Science & Statistics, @tcddublin.bsky.social AI accountability, AI audits & evaluation, critical data studies. Cognitive scientist by training. Ethiopian in Ireland. She/her

View on Bluesky

Abeba Birhane

@abeba.blacksky.app

reading AI eval papers using llm-as-a-judge is alarming for many reasons; many “scientific claims” rely on evals executed on models that are now deprecated. how are we supposed to treat “findings” that are inherently not reproducible? i despair. llms are accelerating the decay of scientific rigor

Jun 24, 12:45 PM