New studies benchmark and mitigate sycophancy in LLMs

AnalysisPolicyAI Models

5 days ago

New studies benchmark and mitigate sycophancy in LLMs

Multiple new arXiv papers propose benchmarks (SICI, BenSyc, Janus) and interventions (adversarial arbitration, probabilistic blending) to measure and reduce LLM sycophancy. The work highlights sycophancy as a persistent alignment challenge across model scales and languages.

Durable Evaluation Framework: Adversarial Arbitration for Sycophancy Reduction in Large Language Models5 days agoSam Ryan

"I understand your perspective": LLM Persuasion and Sycophancy through the Lens of Communicative Action Theory5 days agoEsra D\"onmez, Agnieszka Falenska

Janus: A Benchmark for Goal-Conditioned Information Distortion in LLMs4 days agoPolydoros Giannouris, Mohsinul Kabir, Sophia Ananiadou

Dual-Stance Evaluation of Sycophancy: The Structure of Agreement and the Limits of Intervention3 days agoMatthew James Buchan

The AI Epistemic Deference Index: A Continuous Measure of Sycophancy5 days agoAlejandro Botas, Paul de Font-Reaulx, Luke Hewitt

To Intervene or Not: Guiding Inference-time Alignment with Probabilistic Model Blending3 days agoJin Gan, Xin Li, Jun Luo

5 days ago