Back to AIBriefs
LaunchDevelopers

vLLM adds support for MiniMax models with 1M-context serving

MiniMax avatar
MiniMax
@MiniMax_AI

day-0 in @vllm_project and it comes with: dedicated MSA prefill/decode kernels, 1M-context serving with prefix caching + chunked prefill, BF16 + MXFP8 on both Hopper and Blackwell ๐Ÿš€ this is what open-weight done properly looks like. thanks @vllm_project, @NVIDIAAI, @AIatAMD,

ยท
Jun 12, 9:16 PM
vLLM adds support for MiniMax models with 1M-context serving โ€” AIBriefs