Back to AIBriefs
AnalysisAI Models

MiniMax releases Sparse Attention (MSA) for long-context LLMs

The method, built on Grouped Query Attention (GQA), was trained on a 109B-parameter Mixture-of-Experts model with a 3 trillion token budget. It aims to address the quadratic cost of softmax attention for ultra-long contexts in agentic workflows and code reasoning.

MiniMax releases Sparse Attention (MSA) for long-context LLMs — AIBriefs