AnalysisAI Models
Jun 17, 7:44 AM
MiniMax Sparse Attention (MSA) targets long-context quadratic cost
MiniMax released MSA, a two-branch block-sparse attention method built on GQA, trained on a 109B-parameter MoE model with a 3T-token budget. It aims to reduce the quadratic cost of softmax attention for ultra-long contexts.
