AnalysisAI Models
Jun 18, 4:00 AM
Papers analyze SAE reliability, sparsity, and propose new variants
Several new arxiv papers examine sparse autoencoder (SAE) reliability, sparsity effects, and propose improvements. 'Rational Sparse Autoencoder' learns sparsity mechanisms, while 'Cosine-Scored SAEs' address norm inflation. 'SAE Interventions are Unreliable' warns that suppressing features does not prevent behavior recovery.
·
Jun 18, 4:00 AM