AnalysisAI Models
25 days ago
Nous Research Introduces CNA for Sparse MLP Circuit Steering
CNA identifies neurons responsible for refusal in instruction-tuned LMs without requiring SAE training or weight modification. It enables sparse circuit steering by leveraging neuron attribution.
·
25 days ago
