mKernel library slashes GPU communication overhead in AI workloads

LaunchDevelopersAI Models

12 days ago

mKernel library slashes GPU communication overhead in AI workloads

mKernel, a multi-GPU fused kernel library, reduces communication overhead that can consume 43.6% of the forward pass in MoE models. It aims to accelerate distributed training by fusing compute and communication across nodes.

12 days ago