AnalysisAI ModelsPolicy
6 hours ago
Norm-preserving abliteration on Qwen3.6-35B-A3B achieves 0% refusal with intact benchmarks
Technique removes refusal behavior from Qwen3.6-35B-A3B while preserving benchmark scores. Based on Arditi et al. (2024) refusal direction method; dataset open-sourced.
·
6 hours ago
