Back to AIBriefs
AnalysisAI ModelsDevelopers
Featured

Together AI presents techniques for 5M-token context training

Training a LLaMA 3B model with a 3 million token context on a single 8xH100 node exhausts GPU memory before training starts. The talk covers fully sharded data parallelism and other techniques to reach multi-million token contexts.

·
Jun 8, 5:00 PM
Together AI presents techniques for 5M-token context training — AIBriefs