AnalysisAI ModelsJune 25, 2026
GLM 5.2's Index Share cuts compute 2.9x at 1M token context

At 1M token context, GLM 5.2's Index Share technique achieves 2.9x fewer compute operations compared to naive sparse attention. It reuses sparse attention indexers across four consecutive layers to reduce redundancy.