Five frontier LLMs disagree on 67% of fact-check claims

AnalysisAI ModelsPolicy

16 days ago

Five frontier LLMs disagree on 67% of fact-check claims

In a study of 1,000 real-world fact-check claims, five leading LLMs disagreed on 67% of them, highlighting reliability issues. The models included GPT-4, Claude 3, Gemini 1.5, Llama 3, and Mistral Large.

··Discuss

16 days ago