GLM 5.2 cuts compute 2.9x with Index Share sparse attention

AnalysisAI Models

Jun 25, 12:00 AM

GLM 5.2 cuts compute 2.9x with Index Share sparse attention

Index Share reuses sparse attention indexers across four layers, reducing compute by 2.9x at 1M token context. This makes long-context inference more affordable for production.

Jun 25, 12:00 AM