Papers analyze SAE reliability, sparsity, and propose new variants

AnalysisAI Models

Jun 18, 4:00 AM

Papers analyze SAE reliability, sparsity, and propose new variants

Several new arxiv papers examine sparse autoencoder (SAE) reliability, sparsity effects, and propose improvements. 'Rational Sparse Autoencoder' learns sparsity mechanisms, while 'Cosine-Scored SAEs' address norm inflation. 'SAE Interventions are Unreliable' warns that suppressing features does not prevent behavior recovery.

Effects of sparsity and superposition on loss in simple autoencoders4 days agoMriganka Basu Roy Chowdhury, Eric McLaughlin Weiner

Rational Sparse Autoencoder6 days agoNaiyu Yin, Yue Yu

Size Doesn't Matter: Cosine-Scored Sparse Autoencoders6 days agoSilen Naihin, Lev Stambler

Decompose Sparsely Where You Should, Absorb Densely Where You Should No7 days agoRuixuan Deng, Zehao Jin, Zekun Wang, Zihan Dong

Jun 18, 4:00 AM