AnalysisAI Models
Jun 25, 12:00 AM
GLM 5.2 cuts compute 2.9x with Index Share sparse attention
Index Share reuses sparse attention indexers across four layers, reducing compute by 2.9x at 1M token context. This makes long-context inference more affordable for production.
Jun 25, 12:00 AM
