Dan Fu guest lectures on LLM inference at Stanford

How-ToDevelopers

1 day ago

Dan Fu guest lectures on LLM inference at Stanford

Together AI

@togethercompute

Accelerate inference, model shaping, and pre-training on a research-optimized platform.

San Francisco, CAtogether.ai

View on X

Together AI

@togethercompute

@realDanFu guest lectured in @percyliang's CS336 at Stanford, check out what he covered: → The life of a token: KV cache, prefill/decode disaggregation, and what inference looks like at scale → Megakernels: fusing GPU ops to hit near speed-of-light LLM decode → Parcae: why

1 day ago