ParallelKernelBench: frontier LLMs struggle with fast multi-GPU CUDA kernels

AnalysisAI Models

21 hours ago

ParallelKernelBench: frontier LLMs struggle with fast multi-GPU CUDA kernels

The benchmark spans 87 real-world workloads, with the best model solving under a third. However, a few generated kernels outperform any existing public implementation.

21 hours ago