AnalysisAI Models
Jun 11, 10:54 PM
Untied Ulysses enables 8B/32B training with 25% longer sequences

Together AI
@togethercomputeAccelerate inference, model shaping, and pre-training on a research-optimized platform.
San Francisco, CAtogether.ai

Together AI
@togethercompute
Training a Llama 3B model with a 3M token context on a single 8xH100 node fails because model parameters alone exhaust GPU memory. @m_ryabinin explains how Untied Ulysses, his team's latest research, pushes past that wall, training at 8B and 32B scale with 25% longer sequences
·
Jun 11, 10:54 PM