Back to AIBriefs
AnalysisHealthAI Models

General-purpose large language models outperform specialized clinical AI tools on…

In a Nature Medicine study, GPT-5.2, Gemini 3.1 Pro, and Claude Opus 4.6 outperformed OpenEvidence and UpToDate Expert AI across MedQA, HealthBench, and real clinical queries. Clinical AI tools matched Google Search AI Overview on real-world questions. The authors call for independent testing before clinical use.