AnalysisAI ModelsJune 25, 2026

GLM 5.2's Index Share cuts compute 2.9x at 1M token context

At 1M token context, GLM 5.2's Index Share technique achieves 2.9x fewer compute operations compared to naive sparse attention. It reuses sparse attention indexers across four consecutive layers to reduce redundancy.

1 source

GLM 5.2's Index Share cuts compute 2.9x at 1M token context — AIBriefs