AnalysisAI ModelsHealth
6 days ago
GRPO with variance-aware rubric rewards boosts heart-focused medical QA
The paper introduces variance-aware rubric rewards with GRPO to improve LLM accuracy on cardiology-related medical questions, achieving significant gains over standard supervised fine-tuning. The method addresses both answer correctness and confidence calibration without requiring additional annotated data.
·
6 days ago