Together AI presents techniques for 5M-token context training

AnalysisAI ModelsDevelopers

Jun 8, 5:00 PM

Featured

Together AI presents techniques for 5M-token context training

Training a LLaMA 3B model with a 3 million token context on a single 8xH100 node exhausts GPU memory before training starts. The talk covers fully sharded data parallelism and other techniques to reach multi-million token contexts.

Jun 8, 5:00 PM