Subquadratic – Introducing SubQ 1.1 Small

LaunchAI Models

Jun 16, 2:50 PM

Subquadratic – Introducing SubQ 1.1 Small

SubQ 1.1 Small uses subquadratic sparse attention for long-context tasks, achieving near-perfect retrieval up to 12M tokens. At 1M tokens, it requires 64.5x less compute than dense attention and runs 56x faster than FlashAttention-2. The model is being deployed with select design partners.

··Discuss

Jun 16, 2:50 PM