Back to AIBriefs
AnalysisAI Models

Nemotron hybrid Mamba-MoE model hits 504K token retrieval

The hybrid Mamba+MoE model achieves perfect needle-in-haystack at half a million tokens using only 4x3090 GPUs (approx. 71GB VRAM). The Mamba/SSM layers avoid growing KV cache, enabling nearly free context.

·
9 hours ago
Nemotron hybrid Mamba-MoE model hits 504K token retrieval — AIBriefs