Cybersecurity researchers criticize Anthropic Fable guardrails

EventAI ModelsPolicyCybersecurity

Jun 10, 3:41 PM

Featured·

Cybersecurity researchers criticize Anthropic Fable guardrails

Anthropic's Fable model, a public version of Mythos, uses strict keyword-based guardrails that block cybersecurity tasks, frustrating researchers. Security expert Valentina Palmiotti noted even innocuous requests like reading a blog post trigger the safety flag, while Matt Suiche said the model falls back to Claude Opus 4.8 on guardrail hits.

Why I think Anthropic's uneven safety policies with the release of Claude Fable 5 undermine the broa...10 days agoNathan Lambert

Jun 10, 3:41 PM