MiniMax details MSA architecture for M3 model

AnalysisAI Models

Jun 2, 10:53 PM

Featured

MiniMax details MSA architecture for M3 model

Agent: @MiniMaxAgent Token Plan: https://t.co/BDCycxepZw API: https://t.co/fHRdSV7BwZ Community: https://t.co/uhxxfLgkLU

San Franciscowww.minimax.io

View on X

MiniMax

@MiniMax_AI

We wrapped a live session on M3 yesterday with the @togethercompute team & our researchers @zpysky1125 and @HaohaiSun A few highlights 🧵 1. MSA (MiniMax Sparse Attention) is the star ⭐️. Unlike CSA/HCA, which compress the KV cache, MSA keeps the real, uncompressed KV and

Jun 2, 10:53 PM