Back to AIBriefs
AnalysisAI ModelsPolicy

Five frontier LLMs disagree on 67% of fact-check claims

In a study of 1,000 real-world fact-check claims, five leading LLMs disagreed on 67% of them, highlighting reliability issues. The models included GPT-4, Claude 3, Gemini 1.5, Llama 3, and Mistral Large.

··Discuss
16 days ago
Five frontier LLMs disagree on 67% of fact-check claims — AIBriefs