Subquadratic AI introduces SubQ-1.1-Small using Smart Sparse Attention

LaunchAI Models

16 hours ago

Subquadratic AI introduces SubQ-1.1-Small using Smart Sparse Attention

Near-perfect long-context retrieval up to 12M tokens on needle-in-a-haystack test with up to nearly 1,000x attention compute reduction. At 1M tokens, requires 64.5x less compute than dense attention and runs 56x faster than FlashAttention-2.

··Discuss

16 hours ago