AnalysisAI ModelsPolicy
7 days ago
Expert-Aware Refusal Steering enhances LLM refusal capabilities
Paper introduces Expert-Aware Refusal Steering, a method that applies steering vectors to improve LLM refusal of harmful requests. The approach aims to maintain helpfulness while increasing safety.
·
7 days ago