Nemotron hybrid Mamba-MoE model hits 504K token retrieval

AnalysisAI Models

9 hours ago

Nemotron hybrid Mamba-MoE model hits 504K token retrieval

The hybrid Mamba+MoE model achieves perfect needle-in-haystack at half a million tokens using only 4x3090 GPUs (approx. 71GB VRAM). The Mamba/SSM layers avoid growing KV cache, enabling nearly free context.

9 hours ago