Back to AIBriefs
How-ToDevelopers

Dan Fu guest lectures on LLM inference at Stanford

Together AI avatar
Together AI
@togethercompute

@realDanFu guest lectured in @percyliang's CS336 at Stanford, check out what he covered: → The life of a token: KV cache, prefill/decode disaggregation, and what inference looks like at scale → Megakernels: fusing GPU ops to hit near speed-of-light LLM decode → Parcae: why

·
1 day ago
Dan Fu guest lectures on LLM inference at Stanford — AIBriefs