AnalysisAI Models
Jun 24, 12:45 PM
AI eval papers using LLM-as-a-judge criticized for relying on deprecated models
Dr Abeba Birhane
@abeba.blacksky.appFounder & PI @aial.ie. Assistant Professor of AI, School of Computer Science & Statistics, @tcddublin.bsky.social AI accountability, AI audits & evaluation, critical data studies. Cognitive scientist by training. Ethiopian in Ireland. She/her
Abeba Birhane
@abeba.blacksky.app
reading AI eval papers using llm-as-a-judge is alarming for many reasons; many “scientific claims” rely on evals executed on models that are now deprecated. how are we supposed to treat “findings” that are inherently not reproducible? i despair. llms are accelerating the decay of scientific rigor
·
Jun 24, 12:45 PM