Untied Ulysses enables 8B/32B training with 25% longer sequences

AnalysisAI Models

Jun 11, 10:54 PM

Untied Ulysses enables 8B/32B training with 25% longer sequences

Together AI

@togethercompute

Accelerate inference, model shaping, and pre-training on a research-optimized platform.

San Francisco, CAtogether.ai

View on X

Together AI

@togethercompute

Training a Llama 3B model with a 3M token context on a single 8xH100 node fails because model parameters alone exhaust GPU memory. @m_ryabinin explains how Untied Ulysses, his team's latest research, pushes past that wall, training at 8B and 32B scale with 25% longer sequences

Jun 11, 10:54 PM