Back to AIBriefs
AnalysisAI Models

DNR-Bench: all models fail do-not-respond benchmark

Single-item benchmark prompts models to not respond; any token output counts as a fail. GPT-5.1, Claude Opus 4.8, Gemini 3 Pro, Grok 4, DeepSeek-R1, Llama, Qwen, Mistral all scored 0.0%.

·
1 day ago
DNR-Bench: all models fail do-not-respond benchmark — AIBriefs