General-purpose large language models outperform specialized clinical AI tools on…

AnalysisHealthAI Models

Jun 12, 12:00 AM

General-purpose large language models outperform specialized clinical AI tools on…

In a Nature Medicine study, GPT-5.2, Gemini 3.1 Pro, and Claude Opus 4.6 outperformed OpenEvidence and UpToDate Expert AI across MedQA, HealthBench, and real clinical queries. Clinical AI tools matched Google Search AI Overview on real-world questions. The authors call for independent testing before clinical use.

General-purpose large language models outperform specialized clinical AI tools on medical benchmarks9 days agoEric Karl Oermann

Jun 12, 12:00 AM