LaunchDevelopersAI Models
12 days ago
mKernel library slashes GPU communication overhead in AI workloads
mKernel, a multi-GPU fused kernel library, reduces communication overhead that can consume 43.6% of the forward pass in MoE models. It aims to accelerate distributed training by fusing compute and communication across nodes.
·
12 days ago
