vLLM adds support for MiniMax models with 1M-context serving

LaunchDevelopers

Jun 12, 9:16 PM

vLLM adds support for MiniMax models with 1M-context serving

Agent: @MiniMaxAgent Token Plan: https://t.co/BDCycxepZw API: https://t.co/fHRdSV7BwZ Community: https://t.co/uhxxfLgkLU

San Franciscowww.minimax.io

View on X

MiniMax

@MiniMax_AI

day-0 in @vllm_project and it comes with: dedicated MSA prefill/decode kernels, 1M-context serving with prefix caching + chunked prefill, BF16 + MXFP8 on both Hopper and Blackwell 🚀 this is what open-weight done properly looks like. thanks @vllm_project, @NVIDIAAI, @AIatAMD,

Jun 12, 9:16 PM