AnalysisAI ModelsDevelopers
Jun 8, 5:00 PM
Featured
Together AI presents techniques for 5M-token context training
Training a LLaMA 3B model with a 3 million token context on a single 8xH100 node exhausts GPU memory before training starts. The talk covers fully sharded data parallelism and other techniques to reach multi-million token contexts.
·
Jun 8, 5:00 PM