LaunchAI Models
16 hours ago
Subquadratic AI introduces SubQ-1.1-Small using Smart Sparse Attention
Near-perfect long-context retrieval up to 12M tokens on needle-in-a-haystack test with up to nearly 1,000x attention compute reduction. At 1M tokens, requires 64.5x less compute than dense attention and runs 56x faster than FlashAttention-2.
