AnalysisAI Models
2 days ago
Featured
Road to 5 Million Tokens: Techniques for long-context training
Max Ryabinin of Together AI details techniques for training transformer models with up to 5 million token contexts. Covers fully sharded data parallelism, ring attention, and other optimizations to overcome memory limits on a single 8xH100 node.
·
2 days ago