AnalysisAI Models
8 days ago
Paper asks if value vectors need residual context
Investigates whether value vectors in deep transformer layers rely on context from the residual stream. Finds that directing value computation to a separate pathway can maintain or improve performance while reducing computation.
·
8 days ago