Back to AIBriefs
AnalysisAI Models

NVIDIA releases Gated DeltaNet-2 linear attention layer

Gated DeltaNet-2 decouples erase and write gates for linear attention, trained at 1.3B parameters on 100B FineWeb-Edu tokens. Outperforms Mamba-2, Gated DeltaNet, KDA, and Mamba-3 across benchmark suite.

·
17 days ago
NVIDIA releases Gated DeltaNet-2 linear attention layer — AIBriefs