Back to AIBriefs
EventDevelopers

Together AI optimizes GLM 5.1 inference with kernel rewrites

Together AI avatar
Together AI
@togethercompute

Optimizing GLM 5.1 came down to three things: -> Rewrote the indexer topk kernel -> Fused the indexer kernel to reduce memory and launch overhead -> Eliminated CPU overhead that was gating prefill throughput The bigger win was in the indexer. Once we fixed that, the rest made

Together AI optimizes GLM 5.1 inference with kernel rewrites — AIBriefs