LaunchDevelopers
Jun 12, 9:16 PM
vLLM adds support for MiniMax models with 1M-context serving

MiniMax (official)
@minimax_aiAgent: @MiniMaxAgent Token Plan: https://t.co/BDCycxepZw API: https://t.co/fHRdSV7BwZ Community: https://t.co/uhxxfLgkLU
San Franciscowww.minimax.io

MiniMax
@MiniMax_AI
day-0 in @vllm_project and it comes with: dedicated MSA prefill/decode kernels, 1M-context serving with prefix caching + chunked prefill, BF16 + MXFP8 on both Hopper and Blackwell ๐ this is what open-weight done properly looks like. thanks @vllm_project, @NVIDIAAI, @AIatAMD,
ยท
Jun 12, 9:16 PM