Together AI optimizes GLM 5.1 inference with kernel rewrites

EventDevelopers

Jun 15, 11:59 PM

Together AI optimizes GLM 5.1 inference with kernel rewrites

Together AI

@togethercompute

Accelerate inference, model shaping, and pre-training on a research-optimized platform.

San Francisco, CAtogether.ai

View on X

Together AI

@togethercompute

Optimizing GLM 5.1 came down to three things: -> Rewrote the indexer topk kernel -> Fused the indexer kernel to reduce memory and launch overhead -> Eliminated CPU overhead that was gating prefill throughput The bigger win was in the indexer. Once we fixed that, the rest made

Zai replaced the network architecture running GLM-5.1 inference and the gains are pretty wild22 days agoScared-Biscotti2287 Discuss

Jun 15, 11:59 PM