LaunchDevelopers
15 days ago
llama.cpp adds StepFun 3.5 multi-token prediction support
Pull request #23274 for llama.cpp introduces multi-token prediction (MTP) for StepFun 3.5 models. The feature allows generating multiple tokens per step, improving inference speed.
