MiniMax Sparse Attention (MSA) targets long-context quadratic cost

AnalysisAI Models

Jun 17, 7:44 AM

MiniMax Sparse Attention (MSA) targets long-context quadratic cost

MiniMax released MSA, a two-branch block-sparse attention method built on GQA, trained on a 109B-parameter MoE model with a 3T-token budget. It aims to reduce the quadratic cost of softmax attention for ultra-long contexts.

MiniMax Sparse Attention (MSA)7 days agopmttyji Discuss

Jun 17, 7:44 AM