How-ToDevelopers
1 day ago
Dan Fu guest lectures on LLM inference at Stanford

Together AI
@togethercomputeAccelerate inference, model shaping, and pre-training on a research-optimized platform.
San Francisco, CAtogether.ai

Together AI
@togethercompute
@realDanFu guest lectured in @percyliang's CS336 at Stanford, check out what he covered: → The life of a token: KV cache, prefill/decode disaggregation, and what inference looks like at scale → Megakernels: fusing GPU ops to hit near speed-of-light LLM decode → Parcae: why
·
1 day ago